🌐
Hugging Face
huggingface.co › facebook › wav2vec2-base
facebook/wav2vec2-base · Hugging Face
Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.
🌐
Hugging Face
huggingface.co › facebook › wav2vec2-base-960h
facebook/wav2vec2-base-960h · Hugging Face
The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio.
🌐
GitHub
github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md
open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo
Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...
Author   openvinotoolkit
🌐
Dataloop
dataloop.ai › home › library › models › wav2vec2 base 960h
Wav2vec2 Base 960h · Models · Dataloop
The Wav2vec2 Base 960h model is a powerful tool for speech recognition. What makes it unique is its ability to learn from speech audio alone and fine-tune on transcribed speech, achieving state-of-the-art results with limited labeled data.
🌐
AI Models
aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook
wav2vec2-base-960h | AI Model Details
wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.
🌐
Hugging Face
huggingface.co › rinna › japanese-wav2vec2-base
rinna/japanese-wav2vec2-base · Hugging Face
wav2vec2 · pretraining · speech · arxiv: 2404.01657 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Train · Deploy · Use this model · This is a Japanese wav2vec 2.0 Base model trained by rinna Co., Ltd.
🌐
PyTorch
docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.8.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
🌐
Metatext
metatext.io › models › facebook-wav2vec2-base
facebook/wav2vec2-base Model - NLP Hub
Metatext is a platform that allows you to build, train and deploy NLP models in minutes.
Find elsewhere
🌐
Hugging Face
huggingface.co › docs › transformers › en › model_doc › wav2vec2
Wav2Vec2
Base class for models that have been trained with the Wav2Vec2 loss objective.
🌐
GitHub
github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md
fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq
$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}
Author   facebookresearch
🌐
Springer
link.springer.com › home › international journal of speech technology › article
Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology
October 10, 2024 - From this and the previous tables, we can safely conclude that models “ClTRUS”, “large” and “xlsr-53” all surpass the performance of the base wav2vec2 model.
🌐
Hugging Face
huggingface.co › imprt › izanami-wav2vec2-base
imprt/izanami-wav2vec2-base · Hugging Face
This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.
🌐
Hugging Face
huggingface.co › fav-kky › wav2vec2-base-sk-17k
fav-kky/wav2vec2-base-sk-17k · Hugging Face
It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.
🌐
arXiv
arxiv.org › abs › 2006.11477
[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....
🌐
AWS
aws.amazon.com › blogs › machine-learning › fine-tune-and-deploy-a-wav2vec2-model-for-speech-recognition-with-hugging-face-and-amazon-sagemaker
Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker | Artificial Intelligence
May 25, 2022 - Then the model is fine-tuned on labeled data with the Connectionist Temporal Classification (CTC) algorithm for specific ASR tasks. The base model we use in this post is Wav2Vec2-Base-960h, fine-tuned on 960 hours of Librispeech on 16 kHz sampled speech audio.
🌐
Mohitmayank
mohitmayank.com › a_lazy_data_science_guide › audio_intelligence › wav2vec2
Wav2Vec2 Model - A Lazy Data Science Guide
First, we will select one text dataset. This dataset can be the transcript of train data (part of labeled data we used to finetune Wav2Vec2 model) or a related (same domain like medical, telecom, etc) collection of documents.