🌐
Hugging Face
huggingface.co › facebook › wav2vec2-base
facebook/wav2vec2-base · Hugging Face
Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.
🌐
Hugging Face
huggingface.co › facebook › wav2vec2-base-960h
facebook/wav2vec2-base-960h · Hugging Face
The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio.
🌐
GitHub
github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md
open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo
Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...
Author   openvinotoolkit
🌐
PyTorch
docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.8.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
🌐
Dataloop
dataloop.ai › home › library › models › wav2vec2 base 960h
Wav2vec2 Base 960h · Models · Dataloop
The Wav2vec2 Base 960h model is a powerful tool for speech recognition. What makes it unique is its ability to learn from speech audio alone and fine-tune on transcribed speech, achieving state-of-the-art results with limited labeled data.
🌐
AI Models
aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook
wav2vec2-base-960h | AI Model Details
wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.
🌐
Hugging Face
huggingface.co › rinna › japanese-wav2vec2-base
rinna/japanese-wav2vec2-base · Hugging Face
wav2vec2 · pretraining · speech · arxiv: 2404.01657 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Train · Deploy · Use this model · This is a Japanese wav2vec 2.0 Base model trained by rinna Co., Ltd.
🌐
PyTorch
docs.pytorch.org › audio › stable › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.7.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
🌐
Metatext
metatext.io › models › facebook-wav2vec2-base
facebook/wav2vec2-base Model - NLP Hub
Metatext is a platform that allows you to build, train and deploy NLP models in minutes.
Find elsewhere
🌐
PyTorch
docs.pytorch.org › audio › 0.9.0 › models.html
torchaudio.models — Torchaudio 0.9.0 documentation
Indicates the valid length of each feature in the batch, computed based on the given lengths argument. Shape: (batch, ). ... torchaudio.models.wav2vec2_base(num_out: int) → torchaudio.models.wav2vec2.model.Wav2Vec2Model[source]¶
🌐
Hugging Face
huggingface.co › docs › transformers › en › model_doc › wav2vec2
Wav2Vec2
Base class for models that have been trained with the Wav2Vec2 loss objective.
🌐
GitHub
github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md
fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq
$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}
Author   facebookresearch
🌐
Springer
link.springer.com › home › international journal of speech technology › article
Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology
October 10, 2024 - From this and the previous tables, we can safely conclude that models “ClTRUS”, “large” and “xlsr-53” all surpass the performance of the base wav2vec2 model.
🌐
Hugging Face
huggingface.co › fav-kky › wav2vec2-base-sk-17k
fav-kky/wav2vec2-base-sk-17k · Hugging Face
It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.
🌐
Hugging Face
huggingface.co › imprt › izanami-wav2vec2-base
imprt/izanami-wav2vec2-base · Hugging Face
This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.
🌐
arXiv
arxiv.org › abs › 2006.11477
[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....