← Back to Blog
Technical·6 min read·2026-04-01

Whisper Model Sizes Compared: Which One Should You Use?

OpenAI's Whisper comes in multiple sizes. Each trades accuracy for speed. Picking the right model depends on what you're using it for.

The Models

ModelSizeSpeedAccuracyBest For
Tiny75MBFastestGoodQuick notes, quiet environments
Tiny.en75MBFastestGood (English)English-only, maximum speed
Base140MBVery fastBetterEveryday dictation
Small460MBFastGreatMost use cases
Medium1.5GBModerateVery goodNoisy environments, accents
Large v32.9GBSlowerBestRecordings, meetings, accuracy-critical
Large v3 Turbo1.6GBFastNear-bestBest speed/accuracy balance

Recommendations by Use Case

Live Dictation (Push-to-Talk)

Speed matters here — you want text to appear quickly after you stop speaking. Start with Base or Small. If you have an M2 or later, Large v3 Turbo is fast enough for live dictation with near-best accuracy.

File Transcription

Speed is less critical since you're processing a file in the background. Use Large v3 for the best accuracy. VoxBee automatically chunks long files and processes them sequentially.

Meeting Transcription

Use Large v3 — meetings often have multiple speakers, background noise, and cross-talk where accuracy matters most. The extra processing time is worth it since you're transcribing after the meeting ends.

Non-English Languages

Larger models are significantly better for non-English languages. Use Large v3 or Large v3 Turbo for multilingual transcription. Avoid the ".en" models — they're English-only.

Apple Silicon Performance

Whisper runs on Apple's Neural Engine via WhisperKit. Performance scales with your chip:

  • M1 — Base and Small are real-time or faster. Large v3 is 2-3x slower than real-time.
  • M2/M3 — Small and Medium are real-time. Large v3 approaches real-time.
  • M3 Pro/Max/Ultra — Large v3 runs at or faster than real-time.

How to Switch Models in VoxBee

VoxBee lets you download and switch between all 7 models in Settings → Speech. The app downloads models on demand — you only need disk space for the models you use. You can use a faster model for live dictation and a larger model for file transcription.

Try VoxBee free for 14 days and experiment with different models.

Try VoxBee Free

14-day free trial. No account, no credit card.

Get Started