hyperframes-media

Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net). Use when generating voiceover from text, transcribing speech for captions, removing the background from a video or image to use as a transparent overlay, choosing a TTS voice or whisper model, or chaining these (TTS → transcribe → captions). Each command downloads its own model on first run.

Best for Content creatorsWorks with GitHubLow risk

#tts #transcription #background removal #whisper #kokoro #u2net #hyperframes

⌘source

author: @heygen-com
repo: heygen-com/hyperframes
language: TypeScript

✦overview.md

Key Features

·Text-to-speech with Kokoro-82M
·Audio/video transcription with Whisper
·Background removal with u2net
·Local execution, no API keys
·Model caching in ~/.cache/hyperframes/

Use Cases

→Add voiceover narration from script or text
→Generate captions by transcribing speech audio or video
→Remove background from video or image for transparent overlay

Best for

✓Content creators
✓Video production
✓Asset preprocessing

Not ideal for

!Real-time processing
!Cloud-dependent workflows

FAQs

skills/hyperframes-media/SKILL.md

name

hyperframes-media

description

HyperFrames Media Preprocessing

Three CLI commands that produce assets for compositions: tts (speech), transcribe (timestamps), and remove-background (transparent video). Each downloads a model on first run and caches it under ~/.cache/hyperframes/. Drop the output into the project, then reference it from the composition HTML — see the hyperframes skill for the audio/video element conventions.

Text-to-Speech (`tts`)

Generate speech audio locally with Kokoro-82M. No API key.

npx hyperframes tts "Text here" --voice af_nova --output narration.wav
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
npx hyperframes tts --list                       # all 54 voices

Voice Selection

Match voice to content. Default is af_heart.

Content type	Voice	Why
Product demo	`af_heart`/`af_nova`	Warm, professional
Tutorial / how-to	`am_adam`/`bf_emma`	Neutral, easy to follow
Marketing / promo	`af_sky`/`am_michael`	Energetic or authoritative

...

$install

1-click copy

npx skills add heygen-com/hyperframes --skill hyperframes-media

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

4/ 5

very good

Clear and well structured, with only minor parts that might need a second read.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

3/ 5

medium

Partially actionable with several concrete steps, but still missing important details.

~community cookbook

May 7, 2026

◧ Compare

hyperframes-media

Best for Content creatorsWorks with GitHubLow risk

HyperFrames Media Preprocessing

Text-to-Speech (tts)

Generate speech audio locally with Kokoro-82M. No API key.

npx hyperframes tts "Text here" --voice af_nova --output narration.wav npx hyperframes tts script.txt --voice bf_emma --output narration.wav npx hyperframes tts --list # all 54 voices

Voice Selection

Match voice to content. Default is af_heart.

Content type

Voice

Why

Product demo

af_heart/af_nova

Warm, professional

Tutorial / how-to

am_adam/bf_emma

Neutral, easy to follow

Marketing / promo

af_sky/am_michael

Energetic or authoritative

hyperframes-media

Key Features

Use Cases

Best for

Not ideal for

FAQs

Do I need an API key for TTS or transcription?

Where are models cached?

Can I chain TTS then transcribe the generated audio?

HyperFrames Media Preprocessing

Text-to-Speech (tts)

Voice Selection

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

codebrewrouter-logging-contract

research-repository

design-impact-reporting

estimate-actual

agent-evaluation

spec

AI Skill Finder

hyperframes-media

Key Features

Use Cases

Best for

Not ideal for

FAQs

Do I need an API key for TTS or transcription?

Where are models cached?

Can I chain TTS then transcribe the generated audio?

HyperFrames Media Preprocessing

Text-to-Speech (tts)

Voice Selection

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

codebrewrouter-logging-contract

research-repository

design-impact-reporting

estimate-actual

agent-evaluation

spec

Text-to-Speech (`tts`)

Text-to-Speech (`tts`)