Brave Search

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) - m-bain/whisperX

Starred by 19.2K users

Forked by 2K users

Languages Python

Aicloudautomation

aicloudautomation.net › projects › whisperx

WhisperX - AI Cloud Automation

WhisperX is an enhanced version of OpenAI’s Whisper that provides fast automatic speech recognition with accurate word-level timestamps and speaker diarization. It achieves 70x realtime transcription speed with the large-v2 model and can identify different speakers in audio recordings.

Discussions

After Effects Subtitles Generator Script with WhisperX

Damn! this is so useful! i can't believe there wasn't smt like this before! I'm postponing my suicide at least 5 business days More on reddit.com

r/AfterEffects

May 16, 2025

I compared the different open source whisper packages for long-form transcription

I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended More on reddit.com

r/LocalLLaMA

125

383

March 30, 2024

Speech to Text - Whisper alternatives?

The Whisper model is still the best open source model I've found. But as far as multiple speakers, don't use Whisper by itself - you need to combine it with a good diarization model. I would take a look at the whisperX project which uses faster-whisper (4x speed increase over openAI/whisper) and has VAD and diarization capability included. More on reddit.com

r/LocalLLaMA

June 3, 2024

Issues with torchaudio and whisperx

Torch audio build is specific to not only the gpu vendor, but also the torch build. You need to match torch versions and gpu. You can find prebuilt packages from amd at https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.1/ More on reddit.com

r/ROCm

February 1, 2025

Videos

YouTube

Best FREE Speech to Text AI - WhisperX - w/ Speaker Detection - ...

May 17, 2025

27:23

YouTube

WhisperX: A Beginners Guide to Install & Run - YouTube

Running WhisperX with Distil Whisper Models - YouTube

September 27, 2024

View all

Beam

beam.cloud › blog › whisperx

Deploying WhisperX for Fast Speech Transcription • Beam

November 17, 2024 - Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works.

Dataloop

dataloop.ai › home › library › models › whisperx

WhisperX · Models · Dataloop

WhisperX is a fast automatic speech recognition model that provides accurate transcriptions with word-level timestamps and speaker diarization. It's 70 times faster than real-time and can handle large audio files with ease.

Replicate

replicate.com › victor-upmeet › whisperx

victor-upmeet/whisperx | Run with an API on Replicate

WhisperX provides fast automatic speech recognition (70x realtime with large-v3) with word-level timestamps and speaker diarization.

ISCA Archive

isca-archive.org › interspeech_2023 › bain23_interspeech.html

ISCA Archive - WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Further, their application to long audio via buffered transcription prohibits batched inference due to their sequential nature. To overcome the aforementioned challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.

reddit.com › r/aftereffects › after effects subtitles generator script with whisperx

r/AfterEffects on Reddit: After Effects Subtitles Generator Script with WhisperX

May 16, 2025 -

Tired of searching for a free, powerful way to create subtitles directly in After Effects? I was too, so I built it!

Introducing the AE WhisperX Local Transcriber! This tool combines a user-friendly After Effects dockable script with a local WhisperX API (the latest in speech-to-text AI) to generate accurate, word-level subtitles right in your AE projects.

It's super simple: run the local WhisperAPI.exe on your PC, then use the intuitive script inside After Effects. For coders, all source files are included for easy modification!

Dive in and transform your subtitling workflow:

https://github.com/JavierJerezAntonetti/AE-WhisperX-Local-Transcriber

#AfterEffects #VideoEditing #Subtitles #Transcription #OpenSource #WhisperX

Top answer

1 of 5

I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended

2 of 5

I love that you shared the notebook for running these benchmarks

arXiv

arxiv.org › abs › 2303.00747

[2303.00747] WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

July 11, 2023 - To overcome these challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.

Modal

modal.com › blog › how-to-run-whisperx-on-modal

How to run WhisperX on Modal

January 21, 2025 - WhisperX extends OpenAI’s open-source Whisper model with enhanced speaker diarization and more accurate timestamp alignment. It uses Faster-Whisper under the hood, providing a 4x speed increase compared to the original Whisper.

Data Curious

datacurious.hashnode.dev › unlocking-audio-insights-speaker-diarization-with-whisperx-for-who-said-what

Speaker Diarization with WhisperX: Unlocking "Who Said What" in Audio

July 10, 2025 - WhisperX is a powerful extension to OpenAI’s Whisper, created by Max Bain, a researcher from the University of Oxford.

Metaist

metaist.com › blog › 2023 › 06 › trying-whisperx.html

Trying whisperX - Metaist

June 6, 2023 - Shalev NessAiver pointed me at whisperX which can do all kinds of transcription tasks. He had warned me that speaker diarization (figuring out who is speaking) was super slow.

intelligent-earth.ox.ac.uk › publication › 1341473 › ora-hyrax

WhisperX: time-accurate speech transcription of long-form audio | Intelligent Earth

To overcome these challenges, we present WhisperX, a time-accurate speech recognition system with word-level timestamps utilising voice activity detection and forced phoneme alignment.

GitHub

github.com › VR-13 › WhisperX

GitHub - VR-13/WhisperX

whisperx examples/sample01.wav --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H

Starred by 4 users

Forked by 2 users

Languages Python

GitHub

github.com › cnbeining › whisperX-silero

GitHub - cnbeining/whisperX-silero: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) with Silero VAD

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) with Silero VAD - cnbeining/whisperX-silero

Starred by 12 users

Forked by 3 users

Languages Python

ResearchGate

researchgate.net › publication › 368923066_WhisperX_Time-Accurate_Speech_Transcription_of_Long-Form_Audio

(PDF) WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

March 1, 2023 - Bain et al. (2023) developed and released WhisperX, a voice recognition system with temporal accuracy that, using Whisper, provides word-level temporal records through vocal activity detection and forced phoneme alignment, as illustrated in Fig.

Hugging Face

huggingface.co › spaces › ashhadahsan › whisperX

WhisperX - a Hugging Face Space by ashhadahsan

Upload an audio file to transcribe it into text. Optionally, provide an aligned JSON file to skip transcription. Choose the model, format, and language settings. Get the transcribed text, aligned w...

Pydigger

pydigger.com › pypi › whisperx

whisperx

<h1 align="center">WhisperX</h1> ## Recall.ai - Meeting Transcription API If you’re looking for a transcription API for meetings, consider checking out [Recall.ai's Meeting Transcription API](https://www.recall.ai/product/meeting-transcription-api?utm_source=github&utm_medium=sponsorship&utm_campaign=mbain-whisperx), an API that works with Zoom, Google Meet, Microsoft Teams, and more.

OpenAI

openai.com › index › whisper

Introducing Whisper | OpenAI

We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.