🌐
GitHub
github.com › m-bain › whisperX › issues › 476
Streaming with whisperx · Issue #476 · m-bain/whisperX
September 19, 2023 - Is there a repo or code that allows for real-time streaming with whisperx? Thank you!
Published   Sep 19, 2023
🌐
GitHub
github.com › m-bain › whisperX
GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Speaker Diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. 1st place at Ego4d transcription challenge 🏆 · WhisperX accepted at INTERSPEECH 2023 · v3 transcript segment-per-sentence: using nltk sent_tokenize for better subtitlting & better diarization ·
Starred by 19.2K users
Forked by 2K users
Languages   Python
Discussions

python - How to create a live voice activity detection (VAD) loop for whisperX? - Stack Overflow
I have read through whisperX's VAD source code in whisperx/vad.py but cannot understand how to properly integrate this into a live loop that converts speech-->text in real time. Can someone show me some example code of what this would look like to process a live audio stream instead of a ... More on stackoverflow.com
🌐 stackoverflow.com
Whisper Streaming Strategy
The Whisper text to speech API does not yet support streaming. This would be a great feature. I’m trying to think of ways I can take advantage of Whisper with my Assistant. A moderate response can take 7-10 sec to process, which is a bit slow. I’m considering breaking up the assistant’s ... More on community.openai.com
🌐 community.openai.com
0
July 29, 2024
I compared the different open source whisper packages for long-form transcription
I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended More on reddit.com
🌐 r/LocalLLaMA
125
383
March 30, 2024
Whisper Streaming?
Hi, as far as I know, OpenAI hasn't published any streaming model for Whisper yet! However, in case you need a real-time Whisper transcription in the browser, check out my TypeScript package whisper-live. It's framework-agnostic, uses the OpenAI Whisper model for live transcription and is easy to integrate, which I made for a personal project and later published via Github. 📦 Install with: npm install whisper-live More details here: https://github.com/Alireza29675/whisper-live Happy to help if you have any questions! More on reddit.com
🌐 r/OpenAI
17
6
December 5, 2022
🌐
Modal
modal.com › blog › open-source-stt
The Top Open Source Speech-to-Text (STT) Models in 2025
August 5, 2025 - You should use WhisperX if you want to boost the usability and transcription qualify of any of the underlying Whisper models, specifically when working in multi-speaker contexts or with extended audio recordings.
🌐
Reddit
reddit.com › r/learnmachinelearning › i used whisperx model to give me real time transcription of a user when they speak through the mic but i dont get how to get through few problems
r/learnmachinelearning on Reddit: I used whisperx model to give me real time transcription of a user when they speak through the mic but i dont get how to get through few problems
July 1, 2024 - import sounddevice as sd import numpy as np import whisperx import queue import threading# Initialize the WhisperX model (choose the model size you prefer) model = whisperx.load_model("base") # Change "base" to "tiny", "small", "medium", or "large" as needed# Parameters sample_rate = 16000 # WhisperX works best with 16kHz audio chunk_duration = 5 # Duration of each audio chunk in seconds buffer_size = int(sample_rate * chunk_duration) audio_queue = queue.Queue()def audio_callback(indata, frames, time, status): """Callback function to process audio chunks.""" if status: print(status, flush=True) audio_queue.put(indata.copy())def transcribe_audio(): """Thread function to transcribe audio chunks.""" while True: audio_chunk = audio_queue.get() if audio_chunk is None: break
🌐
Metaist
metaist.com › blog › 2023 › 06 › trying-whisperx.html
Trying whisperX - Metaist
June 6, 2023 - git clone https://github.com/m-bain/whisperX.git cd whisperX pip install torch torchvision torchaudio pip install -e .
🌐
Stack Overflow
stackoverflow.com › questions › 78496897 › how-to-create-a-live-voice-activity-detection-vad-loop-for-whisperx
python - How to create a live voice activity detection (VAD) loop for whisperX? - Stack Overflow
I am using whisperX speech-to-text model to convert my voice into text input for a locally hosted LLM. Right now, I have it set up where I can record an audio file, and then load it into whisperX. ...
🌐
Medium
medium.com › @aidenkoh › how-to-implement-high-speed-voice-recognition-in-chatbot-systems-with-whisperx-silero-vad-cdd45ea30904
How to Implement High-Speed Voice Recognition in Chatbot Systems with WhisperX & Silero-VAD | by Aiden Koh | Medium
March 12, 2025 - By decoupling VAD and ASR, we achieve flexibility — swap WhisperX with any ASR model, or adjust VAD sensitivity without disrupting the pipeline. The final piece is deployment. FastAPI’s WebSocket support enables bidirectional communication for streaming audio and receiving transcriptions:
🌐
OpenAI Developer Community
community.openai.com › api
Whisper Streaming Strategy - API - OpenAI Developer Community
July 29, 2024 - The Whisper text to speech API does not yet support streaming. This would be a great feature. I’m trying to think of ways I can take advantage of Whisper with my Assistant. A moderate response can take 7-10 sec to process, which is a bit slow.
Find elsewhere
🌐
AWS
aws.amazon.com › blogs › containers › host-the-whisper-model-with-streaming-mode-on-amazon-eks-and-ray-serve
Host the Whisper Model with Streaming Mode on Amazon EKS and Ray Serve | Amazon Web Services
June 20, 2024 - For the apps such as transcriptions of live broadcast and online meeting, ASR streaming mode is required. Streaming has different APIs, fully duplex bi-directional streams, and a lower latency requirement, usually in seconds and that needs maintaining states.
🌐
Beam
beam.cloud › blog › whisperx
Deploying WhisperX for Fast Speech Transcription • Beam
November 17, 2024 - Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works.
🌐
Hugging Face
huggingface.co › datasets › chenjoya › Live-WhisperX-526K › blob › main › README.md
README.md · chenjoya/Live-WhisperX-526K at main
August 4, 2025 - Annotation JSONL (WhisperX ASR): ...main/live_whisperx_526k_with_seeks.jsonl · It contains 527,583 real-time video commentary instances, with YouTube categories: Each line of the JSONL file is organized in a common user/assistant conversation format with a special "text_stream" ...
🌐
Reddit
reddit.com › r/localllama › i compared the different open source whisper packages for long-form transcription
r/LocalLLaMA on Reddit: I compared the different open source whisper packages for long-form transcription
March 30, 2024 -

Hey everyone!

I hope you're having a great day.

I recently compared all the open source whisper-based packages that support long-form transcription.

Long-form transcription is basically transcribing audio files that are longer than whisper's input limit, which is 30 seconds. This can be useful if you want to chat with a youtube video or podcast etc.

I compared the following packages:

  1. OpenAI's official whisper package

  2. Huggingface Transformers

  3. Huggingface BetterTransformer (aka Insanely-fast-whisper)

  4. FasterWhisper

  5. WhisperX

  6. Whisper.cpp

I compared between them in the following areas:

  1. Accuracy - using word error rate (wer) and character error rate (cer)

  2. Efficieny - using vram usage and latency

I've written a detailed blog post about this. If you just want the results, here they are:

For all metrics, lower is better

If you have any comments or questions please leave them below.

🌐
Replicate
replicate.com › victor-upmeet › whisperx
victor-upmeet/whisperx | Run with an API on Replicate
WhisperX provides fast automatic speech recognition (70x realtime with large-v3) with word-level timestamps and speaker diarization.
🌐
GitHub
github.com › ufal › whisper_streaming
GitHub - ufal/whisper_streaming: Whisper realtime streaming for long speech-to-text transcription and translation
March 31, 2024 - Whisper realtime streaming for long speech-to-text transcription and translation - ufal/whisper_streaming
Starred by 3.5K users
Forked by 410 users
Languages   Python
🌐
Baseten
baseten.co › blog › zero-to-real-time-transcription-the-complete-whisper-v3-websockets-tutorial
Zero to real-time transcription: The complete Whisper V3 streaming tutorial
August 5, 2025 - You can run real-time transcription with Whisper by concurrently sending audio data and receiving text data from an endpoint with our streaming Whisper implementation. With WebSockets, each user maintains a connection with the server for the ...
🌐
arXiv
arxiv.org › abs › 2307.14743
[2307.14743] Turning Whisper into Real-Time Transcription System
September 21, 2023 - Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription.
🌐
arXiv
arxiv.org › abs › 2506.12154
[2506.12154] Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
June 13, 2025 - However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) ...
🌐
Fireworks AI
fireworks.ai › blog › streaming-audio-launch
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality
Fireworks offers streaming audio service with 300 ms end-to-end latency and Whisper-v3-large quality to power real-time use cases like live captioning