🌐
GitHub
github.com › m-bain › whisperX
GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) - m-bain/whisperX
Starred by 19.2K users
Forked by 2K users
Languages   Python
🌐
Aicloudautomation
aicloudautomation.net › projects › whisperx
WhisperX - AI Cloud Automation
WhisperX is an enhanced version of OpenAI’s Whisper that provides fast automatic speech recognition with accurate word-level timestamps and speaker diarization. It achieves 70x realtime transcription speed with the large-v2 model and can identify different speakers in audio recordings.
Discussions

After Effects Subtitles Generator Script with WhisperX
Damn! this is so useful! i can't believe there wasn't smt like this before! I'm postponing my suicide at least 5 business days More on reddit.com
🌐 r/AfterEffects
31
22
May 16, 2025
I compared the different open source whisper packages for long-form transcription
I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended More on reddit.com
🌐 r/LocalLLaMA
125
383
March 30, 2024
Speech to Text - Whisper alternatives?
The Whisper model is still the best open source model I've found. But as far as multiple speakers, don't use Whisper by itself - you need to combine it with a good diarization model. I would take a look at the whisperX project which uses faster-whisper (4x speed increase over openAI/whisper) and has VAD and diarization capability included. More on reddit.com
🌐 r/LocalLLaMA
34
50
June 3, 2024
Issues with torchaudio and whisperx
Torch audio build is specific to not only the gpu vendor, but also the torch build. You need to match torch versions and gpu. You can find prebuilt packages from amd at https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.1/ More on reddit.com
🌐 r/ROCm
5
5
February 1, 2025
🌐
Beam
beam.cloud › blog › whisperx
Deploying WhisperX for Fast Speech Transcription • Beam
November 17, 2024 - Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works.
🌐
Dataloop
dataloop.ai › home › library › models › whisperx
WhisperX · Models · Dataloop
WhisperX is a fast automatic speech recognition model that provides accurate transcriptions with word-level timestamps and speaker diarization. It's 70 times faster than real-time and can handle large audio files with ease.
🌐
Replicate
replicate.com › victor-upmeet › whisperx
victor-upmeet/whisperx | Run with an API on Replicate
WhisperX provides fast automatic speech recognition (70x realtime with large-v3) with word-level timestamps and speaker diarization.
🌐
ISCA Archive
isca-archive.org › interspeech_2023 › bain23_interspeech.html
ISCA Archive - WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Further, their application to long audio via buffered transcription prohibits batched inference due to their sequential nature. To overcome the aforementioned challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.
🌐
Reddit
reddit.com › r/aftereffects › after effects subtitles generator script with whisperx
r/AfterEffects on Reddit: After Effects Subtitles Generator Script with WhisperX
May 16, 2025 -

Tired of searching for a free, powerful way to create subtitles directly in After Effects? I was too, so I built it!

Introducing the AE WhisperX Local Transcriber! This tool combines a user-friendly After Effects dockable script with a local WhisperX API (the latest in speech-to-text AI) to generate accurate, word-level subtitles right in your AE projects.

It's super simple: run the local WhisperAPI.exe on your PC, then use the intuitive script inside After Effects. For coders, all source files are included for easy modification!

Dive in and transform your subtitling workflow:

https://github.com/JavierJerezAntonetti/AE-WhisperX-Local-Transcriber

#AfterEffects #VideoEditing #Subtitles #Transcription #OpenSource #WhisperX

Find elsewhere
🌐
Valor Software
valor-software.com › articles › interview-transcription-using-whisperx-model-part-1
Interview transcription using WhisperX model, Part 1. - Valor Blog
February 19, 2025 - Subsequent research led me to WhisperX, which augments Whisper’s transcription capabilities through phoneme detection and forced alignment.
🌐
Reddit
reddit.com › r/localllama › i compared the different open source whisper packages for long-form transcription
r/LocalLLaMA on Reddit: I compared the different open source whisper packages for long-form transcription
March 30, 2024 -

Hey everyone!

I hope you're having a great day.

I recently compared all the open source whisper-based packages that support long-form transcription.

Long-form transcription is basically transcribing audio files that are longer than whisper's input limit, which is 30 seconds. This can be useful if you want to chat with a youtube video or podcast etc.

I compared the following packages:

  1. OpenAI's official whisper package

  2. Huggingface Transformers

  3. Huggingface BetterTransformer (aka Insanely-fast-whisper)

  4. FasterWhisper

  5. WhisperX

  6. Whisper.cpp

I compared between them in the following areas:

  1. Accuracy - using word error rate (wer) and character error rate (cer)

  2. Efficieny - using vram usage and latency

I've written a detailed blog post about this. If you just want the results, here they are:

For all metrics, lower is better

If you have any comments or questions please leave them below.

🌐
arXiv
arxiv.org › abs › 2303.00747
[2303.00747] WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
July 11, 2023 - To overcome these challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.
🌐
Modal
modal.com › blog › how-to-run-whisperx-on-modal
How to run WhisperX on Modal
January 21, 2025 - WhisperX extends OpenAI’s open-source Whisper model with enhanced speaker diarization and more accurate timestamp alignment. It uses Faster-Whisper under the hood, providing a 4x speed increase compared to the original Whisper.
🌐
Data Curious
datacurious.hashnode.dev › unlocking-audio-insights-speaker-diarization-with-whisperx-for-who-said-what
Speaker Diarization with WhisperX: Unlocking "Who Said What" in Audio
July 10, 2025 - WhisperX is a powerful extension to OpenAI’s Whisper, created by Max Bain, a researcher from the University of Oxford.
🌐
Metaist
metaist.com › blog › 2023 › 06 › trying-whisperx.html
Trying whisperX - Metaist
June 6, 2023 - Shalev NessAiver pointed me at whisperX which can do all kinds of transcription tasks. He had warned me that speaker diarization (figuring out who is speaking) was super slow.
🌐
Ox
intelligent-earth.ox.ac.uk › publication › 1341473 › ora-hyrax
WhisperX: time-accurate speech transcription of long-form audio | Intelligent Earth
To overcome these challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.
🌐
GitHub
github.com › VR-13 › WhisperX
GitHub - VR-13/WhisperX
whisperx examples/sample01.wav --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H
Starred by 4 users
Forked by 2 users
Languages   Python
🌐
GitHub
github.com › cnbeining › whisperX-silero
GitHub - cnbeining/whisperX-silero: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) with Silero VAD
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) with Silero VAD - cnbeining/whisperX-silero
Starred by 12 users
Forked by 3 users
Languages   Python
🌐
ResearchGate
researchgate.net › publication › 368923066_WhisperX_Time-Accurate_Speech_Transcription_of_Long-Form_Audio
(PDF) WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
March 1, 2023 - Bain et al. (2023) developed and released WhisperX, a voice recognition system with temporal accuracy that, using Whisper, provides word-level temporal records through vocal activity detection and forced phoneme alignment, as illustrated in Fig.
🌐
Hugging Face
huggingface.co › spaces › ashhadahsan › whisperX
WhisperX - a Hugging Face Space by ashhadahsan
Upload an audio file to transcribe it into text. Optionally, provide an aligned JSON file to skip transcription. Choose the model, format, and language settings. Get the transcribed text, aligned w...
🌐
Pydigger
pydigger.com › pypi › whisperx
whisperx
<h1 align="center">WhisperX</h1> ## Recall.ai - Meeting Transcription API If you’re looking for a transcription API for meetings, consider checking out [Recall.ai's Meeting Transcription API](https://www.recall.ai/product/meeting-transcription-api?utm_source=github&utm_medium=sponsorship&utm_campaign=mbain-whisperx), an API that works with Zoom, Google Meet, Microsoft Teams, and more.
🌐
OpenAI
openai.com › index › whisper
Introducing Whisper | OpenAI
We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.