Hugging Face
huggingface.co › facebook › wav2vec2-base
facebook/wav2vec2-base · Hugging Face
Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.
Videos
Deploy Wav2Vec2.0 based Speech Recognition Service in ...
19:08
wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube
07:31
Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...
45:11
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech ...
GitHub
github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md
open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo
Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...
Author openvinotoolkit
AI Models
aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook
wav2vec2-base-960h | AI Model Details
wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.
PyTorch
docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html
WAV2VEC2_BASE — Torchaudio 2.8.0 documentation
Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...
Metatext
metatext.io › models › facebook-wav2vec2-base
facebook/wav2vec2-base Model - NLP Hub
Metatext is a platform that allows you to build, train and deploy NLP models in minutes.
PyTorch
docs.pytorch.org › audio › stable › tutorials › speech_recognition_pipeline_tutorial.html
Speech Recognition with Wav2Vec2 — Torchaudio 2.9.0 documentation
We will use torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H here.
NVIDIA
catalog.ngc.nvidia.com › orgs › nvidia › teams › dle › models › wav2vec2_base_pyt_ckpt_finetune
GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC
GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC
GitHub
github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md
fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq
$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}
Author facebookresearch
Hugging Face
huggingface.co › fav-kky › wav2vec2-base-sk-17k
fav-kky/wav2vec2-base-sk-17k · Hugging Face
It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.
arXiv
arxiv.org › abs › 2006.11477
[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....
Mohitmayank
mohitmayank.com › a_lazy_data_science_guide › audio_intelligence › wav2vec2
Wav2Vec2 Model - A Lazy Data Science Guide
First, we will select one text dataset. This dataset can be the transcript of train data (part of labeled data we used to finetune Wav2Vec2 model) or a related (same domain like medical, telecom, etc) collection of documents.