TRANSCRIPTION

Drop a file or paste a URL.
Get a timestamped transcript.

Drag and drop any audio file — MP3, WAV, M4A, AIFF — or paste a podcast, YouTube, or video URL. VoxBee downloads the audio, transcribes it locally with Whisper or NVIDIA Parakeet models and accurate timestamps, and generates an AI-powered summary with key topics, highlights, and takeaways.

VoxBee transcription workspace showing completed transcripts, file dropzone, and URL import

See it in action

Transcribe anything, anywhere

Drag-and-Drop or File Picker

Drop any audio file — MP3, WAV, M4A, AIFF — directly into VoxBee, or use the file picker to select files.

Paste a URL from 1,800+ Sites

Paste a YouTube, podcast, Vimeo, SoundCloud, X/Twitter, or Twitch URL. VoxBee downloads the audio and transcribes it locally.

AI-Powered Summaries

After transcription, send the text to OpenAI, Anthropic, or your local Ollama server for structured summaries with key topics, highlights, and takeaways.

Timestamped Segments

Every transcript includes accurate timestamps so you can jump to specific moments in the original audio.

Smart Chunking

Long files are split into 30-second segments at silence gaps, transcribed in parallel, and stitched back together with word-level deduplication.

Export Formats

Export your transcripts as plain text (.txt), subtitles (.srt), or markdown (.md). Files are saved to ~/Documents/VoxBee/Transcriptions/.

11 Cloud Providers (BYO Key)

Need extra accuracy or a hosted endpoint? Plug in your own API key for OpenAI (including diarized gpt-4o-transcribe), Deepgram, AssemblyAI, ElevenLabs, Groq, xAI Grok, Mistral Voxtral, Cohere, Speechmatics, Alibaba Qwen3-ASR, or Soniox. Stays disabled unless you turn it on.

Speaker Diarization (Beta)

On-device speaker detection powered by NVIDIA Sortformer. Get a transcript with each turn labelled by speaker — great for interviews and panel recordings. Available as a Beta action on completed transcripts.

On-device by default.

Transcription runs locally with Whisper or NVIDIA Parakeet — your files stay on your machine. If you opt into a cloud provider with your own API key, a persistent purple cloud badge shows whenever audio leaves your device, so it's always clear when something is being uploaded.

One price. Yours forever.

$49 one-time purchase. No subscriptions. No cloud fees.

See Pricing