Hugging Face
huggingface.co › facebook › wav2vec2-base
facebook/wav2vec2-base · Hugging Face
Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.
Videos
Deploy Wav2Vec2.0 based Speech Recognition Service in ...
19:08
wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube
07:31
Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...
45:11
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech ...
GitHub
github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md
open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo
Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...
Author openvinotoolkit
PyTorch
docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.8.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
AI Models
aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook
wav2vec2-base-960h | AI Model Details
wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.
PyTorch
docs.pytorch.org › audio › stable › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.7.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
Metatext
metatext.io › models › facebook-wav2vec2-base
facebook/wav2vec2-base Model - NLP Hub
Metatext is a platform that allows you to build, train and deploy NLP models in minutes.
PyTorch
docs.pytorch.org › audio › 0.9.0 › models.html
torchaudio.models — Torchaudio 0.9.0 documentation
Indicates the valid length of each feature in the batch, computed based on the given lengths argument. Shape: (batch, ). ... torchaudio.models.wav2vec2_base(num_out: int) → torchaudio.models.wav2vec2.model.Wav2Vec2Model[source]¶
PyTorch
docs.pytorch.org › audio › stable › tutorials › speech_recognition_pipeline_tutorial.html
Speech Recognition with Wav2Vec2 — Torchaudio 2.9.0 documentation
We will use torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H here.
NVIDIA
catalog.ngc.nvidia.com › orgs › nvidia › teams › dle › models › wav2vec2_base_pyt_ckpt_finetune
GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC
GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC
GitHub
github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md
fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq
$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}
Author facebookresearch
Hugging Face
huggingface.co › fav-kky › wav2vec2-base-sk-17k
fav-kky/wav2vec2-base-sk-17k · Hugging Face
It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.
arXiv
arxiv.org › abs › 2006.11477
[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....