whisperx streaming - Brave Search

github.com › m-bain › whisperX › issues › 476

Streaming with whisperx · Issue #476 · m-bain/whisperX

September 19, 2023 - Is there a repo or code that allows for real-time streaming with whisperx? Thank you!

Published Sep 19, 2023

github.com › m-bain › whisperX

GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Speaker Diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. 1st place at Ego4d transcription challenge 🏆 · WhisperX accepted at INTERSPEECH 2023 · v3 transcript segment-per-sentence: using nltk sent_tokenize for better subtitlting & better diarization ·

Starred by 19.2K users

Forked by 2K users

Languages Python

Discussions

python - How to create a live voice activity detection (VAD) loop for whisperX? - Stack Overflow

I have read through whisperX's VAD source code in whisperx/vad.py but cannot understand how to properly integrate this into a live loop that converts speech-->text in real time. Can someone show me some example code of what this would look like to process a live audio stream instead of a ... More on stackoverflow.com

stackoverflow.com

Whisper Streaming Strategy

The Whisper text to speech API does not yet support streaming. This would be a great feature. I’m trying to think of ways I can take advantage of Whisper with my Assistant. A moderate response can take 7-10 sec to process, which is a bit slow. I’m considering breaking up the assistant’s ... More on community.openai.com

community.openai.com

0

July 29, 2024

I compared the different open source whisper packages for long-form transcription

I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended More on reddit.com

r/LocalLLaMA

125

383

March 30, 2024

Whisper Streaming?

Hi, as far as I know, OpenAI hasn't published any streaming model for Whisper yet! However, in case you need a real-time Whisper transcription in the browser, check out my TypeScript package whisper-live. It's framework-agnostic, uses the OpenAI Whisper model for live transcription and is easy to integrate, which I made for a personal project and later published via Github. 📦 Install with: npm install whisper-live More details here: https://github.com/Alireza29675/whisper-live Happy to help if you have any questions! More on reddit.com

r/OpenAI

17

6

December 5, 2022

Videos

Best FREE Speech to Text AI - WhisperX - w/ Speaker Detection - ...

WhisperX: A Beginners Guide to Install & Run - YouTube

September 27, 2024

Creating an Auto-transcriber for YouTube Videos with Whisperx - ...

Can Whisper be used for real-time streaming ASR? - YouTube

Streaming Speech to Text Models - Kyutai vs Whisper - YouTube

August 26, 2025

WhisperLive: Real-Time Speech Transcription with Whisper - YouTube

modal.com › blog › open-source-stt

The Top Open Source Speech-to-Text (STT) Models in 2025

August 5, 2025 - You should use WhisperX if you want to boost the usability and transcription qualify of any of the underlying Whisper models, specifically when working in multi-speaker contexts or with extended audio recordings.

reddit.com › r/learnmachinelearning › i used whisperx model to give me real time transcription of a user when they speak through the mic but i dont get how to get through few problems

r/learnmachinelearning on Reddit: I used whisperx model to give me real time transcription of a user when they speak through the mic but i dont get how to get through few problems

July 1, 2024 - import sounddevice as sd import numpy as np import whisperx import queue import threading# Initialize the WhisperX model (choose the model size you prefer) model = whisperx.load_model("base") # Change "base" to "tiny", "small", "medium", or "large" as needed# Parameters sample_rate = 16000 # WhisperX works best with 16kHz audio chunk_duration = 5 # Duration of each audio chunk in seconds buffer_size = int(sample_rate * chunk_duration) audio_queue = queue.Queue()def audio_callback(indata, frames, time, status): """Callback function to process audio chunks.""" if status: print(status, flush=True) audio_queue.put(indata.copy())def transcribe_audio(): """Thread function to transcribe audio chunks.""" while True: audio_chunk = audio_queue.get() if audio_chunk is None: break

metaist.com › blog › 2023 › 06 › trying-whisperx.html

Trying whisperX - Metaist

June 6, 2023 - git clone https://github.com/m-bain/whisperX.git cd whisperX pip install torch torchvision torchaudio pip install -e .

stackoverflow.com › questions › 78496897 › how-to-create-a-live-voice-activity-detection-vad-loop-for-whisperx

python - How to create a live voice activity detection (VAD) loop for whisperX? - Stack Overflow

I am using whisperX speech-to-text model to convert my voice into text input for a locally hosted LLM. Right now, I have it set up where I can record an audio file, and then load it into whisperX. ...

medium.com › @aidenkoh › how-to-implement-high-speed-voice-recognition-in-chatbot-systems-with-whisperx-silero-vad-cdd45ea30904

How to Implement High-Speed Voice Recognition in Chatbot Systems with WhisperX & Silero-VAD | by Aiden Koh | Medium

March 12, 2025 - By decoupling VAD and ASR, we achieve flexibility — swap WhisperX with any ASR model, or adjust VAD sensitivity without disrupting the pipeline. The final piece is deployment. FastAPI’s WebSocket support enables bidirectional communication for streaming audio and receiving transcriptions:

OpenAI Developer Community

community.openai.com › api

Whisper Streaming Strategy - API - OpenAI Developer Community

July 29, 2024 - The Whisper text to speech API does not yet support streaming. This would be a great feature. I’m trying to think of ways I can take advantage of Whisper with my Assistant. A moderate response can take 7-10 sec to process, which is a bit slow.

Find elsewhere

Google Bing Mojeek

aws.amazon.com › blogs › containers › host-the-whisper-model-with-streaming-mode-on-amazon-eks-and-ray-serve

Host the Whisper Model with Streaming Mode on Amazon EKS and Ray Serve | Amazon Web Services

June 20, 2024 - For the apps such as transcriptions of live broadcast and online meeting, ASR streaming mode is required. Streaming has different APIs, fully duplex bi-directional streams, and a lower latency requirement, usually in seconds and that needs maintaining states.

beam.cloud › blog › whisperx

Deploying WhisperX for Fast Speech Transcription • Beam

November 17, 2024 - Run sandboxes, inference, and training with ultrafast boot times, instant autoscaling, and a developer experience that just works.

huggingface.co › datasets › chenjoya › Live-WhisperX-526K › blob › main › README.md

README.md · chenjoya/Live-WhisperX-526K at main

August 4, 2025 - Annotation JSONL (WhisperX ASR): ...main/live_whisperx_526k_with_seeks.jsonl · It contains 527,583 real-time video commentary instances, with YouTube categories: Each line of the JSONL file is organized in a common user/assistant conversation format with a special "text_stream" ...

reddit.com › r/localllama › i compared the different open source whisper packages for long-form transcription

r/LocalLLaMA on Reddit: I compared the different open source whisper packages for long-form transcription

March 30, 2024 -

Hey everyone!

I hope you're having a great day.

I recently compared all the open source whisper-based packages that support long-form transcription.

Long-form transcription is basically transcribing audio files that are longer than whisper's input limit, which is 30 seconds. This can be useful if you want to chat with a youtube video or podcast etc.

I compared the following packages:

OpenAI's official whisper package
Huggingface Transformers
Huggingface BetterTransformer (aka Insanely-fast-whisper)
FasterWhisper
WhisperX
Whisper.cpp

I compared between them in the following areas:

Accuracy - using word error rate (wer) and character error rate (cer)
Efficieny - using vram usage and latency

I've written a detailed blog post about this. If you just want the results, here they are:

For all metrics, lower is better

If you have any comments or questions please leave them below.

I tried all of them too and whisperX is by far better than the rest. And much faster too. Highly recommended

I love that you shared the notebook for running these benchmarks

replicate.com › victor-upmeet › whisperx

victor-upmeet/whisperx | Run with an API on Replicate

WhisperX provides fast automatic speech recognition (70x realtime with large-v3) with word-level timestamps and speaker diarization.

github.com › ufal › whisper_streaming

GitHub - ufal/whisper_streaming: Whisper realtime streaming for long speech-to-text transcription and translation

March 31, 2024 - Whisper realtime streaming for long speech-to-text transcription and translation - ufal/whisper_streaming

Starred by 3.5K users

Forked by 410 users

Languages Python

baseten.co › blog › zero-to-real-time-transcription-the-complete-whisper-v3-websockets-tutorial

Zero to real-time transcription: The complete Whisper V3 streaming tutorial

August 5, 2025 - You can run real-time transcription with Whisper by concurrently sending audio data and receiving text data from an endpoint with our streaming Whisper implementation. With WebSockets, each user maintains a connection with the server for the ...

arxiv.org › abs › 2307.14743

[2307.14743] Turning Whisper into Real-Time Transcription System

September 21, 2023 - Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription.

reddit.com › r/openai › whisper streaming?

r/OpenAI on Reddit: Whisper Streaming?

December 5, 2022 -

Does OpenAI have plans to develop live audio streaming in Whisper?

Hi, as far as I know, OpenAI hasn't published any streaming model for Whisper yet! However, in case you need a real-time Whisper transcription in the browser, check out my TypeScript package whisper-live. It's framework-agnostic, uses the OpenAI Whisper model for live transcription and is easy to integrate, which I made for a personal project and later published via Github. 📦 Install with: npm install whisper-live More details here: https://github.com/Alireza29675/whisper-live Happy to help if you have any questions!

Since 2 years ago, has any other company emerged to handle this? or other open library? It's a huge pain!!! I want to create an AI scribe that fills out data on a DB as you talk, thus creating a wow-moment for customers, etc. Suggestions on how to do this?

arxiv.org › abs › 2506.12154

[2506.12154] Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding

June 13, 2025 - However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) ...

fireworks.ai › blog › streaming-audio-launch

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Fireworks offers streaming audio service with 300 ms end-to-end latency and Whisper-v3-large quality to power real-time use cases like live captioning