Hugging Face
huggingface.co › facebook › wav2vec2-large
facebook/wav2vec2-large · Hugging Face
wav2vec2 · pretraining · speech · arxiv: 2006.11477 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Deploy · Use this model · Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio.
Videos
Deploy Wav2Vec2.0 based Speech Recognition Service in ...
19:08
wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube
07:31
Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...
11:45
Fine-Tuning Wav2Vec2 using HuggingFace | Audio Classification - ...
26:18
Speech Recognition in Python | finetune wav2vec2 model for a custom ...
Wav2vec2 A Framework for Self-Supervised Learning of ...
GitHub
github.com › vistec-AI › wav2vec2-large-xlsr-53-th
GitHub - vistec-AI/wav2vec2-large-xlsr-53-th: Finetune wav2vec2-large-xlsr-53 with Thai Common Voice Corpus 7.0
We finetune wav2vec2-large-xlsr-53 based on Fine-tuning Wav2Vec2 for English ASR using Thai examples of Common Voice Corpus 7.0. The notebooks and scripts can be found in vistec-ai/wav2vec2-large-xlsr-53-th.
Starred by 51 users
Forked by 13 users
Languages Jupyter Notebook 97.0% | Python 3.0%
AI Models
aimodels.fyi › models › huggingFace › wav2vec2-large-xlsr-53-facebook
wav2vec2-large-xlsr-53 | AI Model Details
The wav2vec2-xls-r-300m model is Facebook's large-scale multilingual pretrained model for speech. It uses the wav2vec 2.0 objective and is pretrained on 436,000 hours of unlabeled speech data across 128 languages, including datasets like VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107.
SourceForge
sourceforge.net › projects › wav2vec2-large-xlsr-53-russian
wav2vec2-large-xlsr-53-russian download | SourceForge.net
Download wav2vec2-large-xlsr-53-russian for free. Russian ASR model fine-tuned on Common Voice and CSS10 datasets. wav2vec2-large-xlsr-53-russian is a fine-tuned automatic speech recognition (ASR) model based on Facebook’s wav2vec2-large-xlsr-53 and optimized for Russian.
Springer
link.springer.com › home › international journal of speech technology › article
Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology
October 10, 2024 - The parameters of each model are summarized in Table 1. In subsequent text, we will refer to these models simply as “base”, “large”, “xlsr-53” and “ClTRUS”. Table 1 Pre-trained wav2vec 2.0 models used in this article ... Note that although the pre-training data of the “ClTRUS” model do not match the English language of the datasets used in this article (described below), this should not be an issue—all three examined speech processing tasks are language-independent. To evaluate the effectiveness of the different wav2vec2 models, we tested our system on several widely used English-language conversational speech corpora, which have annotated speaker turns and for which there are recent results from other authors.
GitHub
github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md
fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq
$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_large_librivox --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}
Author facebookresearch
Metatext
metatext.io › models › facebook-wav2vec2-large
facebook/wav2vec2-large Model - NLP Hub
Metatext is a platform that allows you to build, train and deploy NLP models in minutes.
GitHub
github.com › oliverguhr › wav2vec2-live
GitHub - oliverguhr/wav2vec2-live: A live speech recognition using Facebooks wav2vec 2.0 model.
from live_asr import LiveWav2Vec2 english_model = "facebook/wav2vec2-large-960h-lv60-self" german_model = "maxidl/wav2vec2-large-xlsr-german" asr = LiveWav2Vec2(german_model,device_name="default") asr.start() try: while True: text,sample_length,inference_time = asr.get_last_text() print(f"{sample_length:.3f}s" +f"\t{inference_time:.3f}s" +f"\t{text}") except KeyboardInterrupt: asr.stop()
Starred by 374 users
Forked by 58 users
Languages Python
Mohitmayank
mohitmayank.com › a_lazy_data_science_guide › audio_intelligence › wav2vec2
Wav2Vec2 Model - A Lazy Data Science Guide
There are two interesting points to note from the results of the Wav2Vec2 model, The model is able to learn ASR with as minimum as 10 mins of labeled data! As shown below, \(LARGE\) model pre-trained on LV-60k and finetuned on Librispeech with CTC loss is giving 4.6/7.9 WER!
GitHub
github.com › huggingface › transformers › blob › main › docs › source › en › model_doc › wav2vec2.md
transformers/docs/source/en/model_doc/wav2vec2.md at main · huggingface/transformers
Note: Meta (FAIR) released a new version of Wav2Vec2-BERT 2.0 - it's pretrained on 4.5M hours of audio. We especially recommend using it for fine-tuning tasks, e.g.
Author huggingface