Brave Search

wav2vec2 · pretraining · speech · arxiv: 2006.11477 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Deploy · Use this model · Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio.

Hugging Face

huggingface.co › facebook › wav2vec2-large-960h

facebook/wav2vec2-large-960h · Hugging Face

The large model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio.

Videos

youtube.com

Deploy Wav2Vec2.0 based Speech Recognition Service in ...

19:08

YouTube

wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube

May 7, 2021

07:31

YouTube

Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...

June 28, 2020

11:45

YouTube

Fine-Tuning Wav2Vec2 using HuggingFace | Audio Classification - ...

Speech Recognition in Python | finetune wav2vec2 model for a custom ...

October 5, 2023

youtube.com

Wav2vec2 A Framework for Self-Supervised Learning of ...

View all

GitHub

github.com › vistec-AI › wav2vec2-large-xlsr-53-th

GitHub - vistec-AI/wav2vec2-large-xlsr-53-th: Finetune wav2vec2-large-xlsr-53 with Thai Common Voice Corpus 7.0

We finetune wav2vec2-large-xlsr-53 based on Fine-tuning Wav2Vec2 for English ASR using Thai examples of Common Voice Corpus 7.0. The notebooks and scripts can be found in vistec-ai/wav2vec2-large-xlsr-53-th.

Starred by 51 users

Forked by 13 users

Languages Jupyter Notebook 97.0% | Python 3.0%

AI Models

aimodels.fyi › models › huggingFace › wav2vec2-large-xlsr-53-facebook

wav2vec2-large-xlsr-53 | AI Model Details

The wav2vec2-xls-r-300m model is Facebook's large-scale multilingual pretrained model for speech. It uses the wav2vec 2.0 objective and is pretrained on 436,000 hours of unlabeled speech data across 128 languages, including datasets like VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107.

Hugging Face

huggingface.co › facebook › wav2vec2-large-robust

facebook/wav2vec2-large-robust · Hugging Face

Facebook's Wav2Vec2 · The large model pretrained on 16kHz sampled speech audio. Speech datasets from multiple domains were used to pretrain the model: Libri-Light: open-source audio books from the LibriVox project; clean, read-out audio data ...

Hugging Face

huggingface.co › jonatasgrosman › wav2vec2-large-english

jonatasgrosman/wav2vec2-large-english · Hugging Face

Fine-tuned facebook/wav2vec2-large on English using the train and validation splits of Common Voice 6.1.

Dataloop

dataloop.ai › home › library › models › wav2vec2 large 960h

Wav2vec2 Large 960h · Models · Dataloop

Wav2vec2 Large 960h is an AI model that can recognize and transcribe speech from audio files with remarkable accuracy. Trained on 960 hours of speech data, it can achieve word error rates as low as 1.8% and 3.3% on clean and noisy audio, ...

Hugging Face

huggingface.co › docs › transformers › model_doc › wav2vec2

Wav2Vec2

Below is an expected speedup diagram comparing the pure inference time between the native implementation in transformers of the facebook/wav2vec2-large-960h-lv60-self model and the flash-attention-2 and sdpa (scale-dot-product-attention) versions.

SourceForge

sourceforge.net › projects › wav2vec2-large-xlsr-53-russian

wav2vec2-large-xlsr-53-russian download | SourceForge.net

Download wav2vec2-large-xlsr-53-russian for free. Russian ASR model fine-tuned on Common Voice and CSS10 datasets. wav2vec2-large-xlsr-53-russian is a fine-tuned automatic speech recognition (ASR) model based on Facebook’s wav2vec2-large-xlsr-53 and optimized for Russian.

Find elsewhere

Google Bing Mojeek

Springer

link.springer.com › home › international journal of speech technology › article

Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology

October 10, 2024 - The parameters of each model are summarized in Table 1. In subsequent text, we will refer to these models simply as “base”, “large”, “xlsr-53” and “ClTRUS”. Table 1 Pre-trained wav2vec 2.0 models used in this article ... Note that although the pre-training data of the “ClTRUS” model do not match the English language of the datasets used in this article (described below), this should not be an issue—all three examined speech processing tasks are language-independent. To evaluate the effectiveness of the different wav2vec2 models, we tested our system on several widely used English-language conversational speech corpora, which have annotated speaker turns and for which there are recent results from other authors.

GitHub

github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md

fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq

$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_large_librivox --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}

Author facebookresearch

Metatext

metatext.io › models › facebook-wav2vec2-large

facebook/wav2vec2-large Model - NLP Hub

Metatext is a platform that allows you to build, train and deploy NLP models in minutes.

Stack Overflow

stackoverflow.com › questions › 75641342 › cant-use-wav2vec2-large-xlsr-model-cant-load-tokenizer

deep learning - Can't use wav2vec2-large-xlsr model (Can't load tokenizer ) - Stack Overflow

Top answer

1 of 1

That particular Wav2Vec model only provides the pre-trained representation vectors and doesn't have a fine-tuned CTC/speech recognition model:

Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition.

jonatasgrosman/wav2vec2-large-xlsr-53-english is a popular fine-tuned CTC model for English based on Wav2Vec2 XLSR, and models for other languages have also been trained.

If you want to use the output of the model anyway (just getting feature vectors from audio), use a Wav2Vec2FeatureExtractor instead of Wav2Vec2Processor, and Wav2Vec2Model instead of Wav2Vec2ForCTC.

(A Wav2Vec2Processor combines a FeatureExtractor and a Tokenizer [docs]. Since tokenization depends on the task the model is fine-tuned for, a speech model that isn't fine-tuned often doesn't have a tokenizer. Credit to @mmbejani on Github for this info.)

GitHub

github.com › oliverguhr › wav2vec2-live

GitHub - oliverguhr/wav2vec2-live: A live speech recognition using Facebooks wav2vec 2.0 model.

from live_asr import LiveWav2Vec2 english_model = "facebook/wav2vec2-large-960h-lv60-self" german_model = "maxidl/wav2vec2-large-xlsr-german" asr = LiveWav2Vec2(german_model,device_name="default") asr.start() try: while True: text,sample_length,inference_time = asr.get_last_text() print(f"{sample_length:.3f}s" +f"\t{inference_time:.3f}s" +f"\t{text}") except KeyboardInterrupt: asr.stop()

Starred by 374 users

Forked by 58 users

Languages Python

Mohitmayank

mohitmayank.com › a_lazy_data_science_guide › audio_intelligence › wav2vec2

Wav2Vec2 Model - A Lazy Data Science Guide

There are two interesting points to note from the results of the Wav2Vec2 model, The model is able to learn ASR with as minimum as 10 mins of labeled data! As shown below, $LARGE$ model pre-trained on LV-60k and finetuned on Librispeech with CTC loss is giving 4.6/7.9 WER!

GitHub

github.com › huggingface › transformers › blob › main › docs › source › en › model_doc › wav2vec2.md

transformers/docs/source/en/model_doc/wav2vec2.md at main · huggingface/transformers

Note: Meta (FAIR) released a new version of Wav2Vec2-BERT 2.0 - it's pretrained on 4.5M hours of audio. We especially recommend using it for fine-tuning tasks, e.g.

Author huggingface

Hugging Face

huggingface.co › jonatasgrosman › wav2vec2-large-xlsr-53-russian

jonatasgrosman/wav2vec2-large-xlsr-53-russian · Hugging Face

Fine-tuned facebook/wav2vec2-large-xlsr-53 on Russian using the train and validation splits of Common Voice 6.1 and CSS10.

Hugging Face

huggingface.co › jonatasgrosman › wav2vec2-large-xlsr-53-german

jonatasgrosman/wav2vec2-large-xlsr-53-german · Hugging Face

Fine-tuned facebook/wav2vec2-large-xlsr-53 on German using the train and validation splits of Common Voice 6.1.

Hugging Face

huggingface.co › bond005 › wav2vec2-large-ru-golos-with-lm

bond005/wav2vec2-large-ru-golos-with-lm · Hugging Face

The Wav2Vec2 model is based on facebook/wav2vec2-large-xlsr-53, fine-tuned in Russian using Sberdevices Golos with audio augmentations like as pitch shift, acceleration/deceleration of sound, reverberation etc.

Hugging Face

huggingface.co › reazon-research › japanese-wav2vec2-large

reazon-research/japanese-wav2vec2-large · Hugging Face

This is a Japanese wav2vec 2.0 Large model pre-trained on ReazonSpeech v2.0 corpus. We also release the CTC model reazon-research/japanese-wav2vec2-large-rs35kh derived from this model.