Brave Search

Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.

Hugging Face

huggingface.co › facebook › wav2vec2-base-960h

facebook/wav2vec2-base-960h · Hugging Face

The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio.

Videos

youtube.com

Deploy Wav2Vec2.0 based Speech Recognition Service in ...

19:08

YouTube

wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube

May 7, 2021

07:31

YouTube

Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech ...

June 24, 2020

View all

GitHub

github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md

open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo

Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...

Author openvinotoolkit

PyTorch

docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html

WAV2VEC2_BASE — Torchaudio 2.8.0 documentation

Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...

Dataloop

dataloop.ai › home › library › models › wav2vec2 base 960h

Wav2vec2 Base 960h · Models · Dataloop

The Wav2vec2 Base 960h model is a powerful tool for speech recognition. What makes it unique is its ability to learn from speech audio alone and fine-tune on transcribed speech, achieving state-of-the-art results with limited labeled data.

AI Models

aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook

wav2vec2-base-960h | AI Model Details

wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.

Hugging Face

huggingface.co › rinna › japanese-wav2vec2-base

rinna/japanese-wav2vec2-base · Hugging Face

wav2vec2 · pretraining · speech · arxiv: 2404.01657 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Train · Deploy · Use this model · This is a Japanese wav2vec 2.0 Base model trained by rinna Co., Ltd.

PyTorch

docs.pytorch.org › audio › stable › generated › torchaudio.pipelines.WAV2VEC2_BASE.html

WAV2VEC2_BASE — Torchaudio 2.7.0 documentation

Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...

Metatext

metatext.io › models › facebook-wav2vec2-base

facebook/wav2vec2-base Model - NLP Hub

Metatext is a platform that allows you to build, train and deploy NLP models in minutes.

Find elsewhere

Google Bing Mojeek

PyTorch

docs.pytorch.org › audio › 0.9.0 › models.html

torchaudio.models — Torchaudio 0.9.0 documentation

Indicates the valid length of each feature in the batch, computed based on the given lengths argument. Shape: (batch, ). ... torchaudio.models.wav2vec2_base(num_out: int) → torchaudio.models.wav2vec2.model.Wav2Vec2Model[source]¶

Hugging Face

huggingface.co › docs › transformers › en › model_doc › wav2vec2

Wav2Vec2

Base class for models that have been trained with the Wav2Vec2 loss objective.

PyTorch

docs.pytorch.org › audio › stable › tutorials › speech_recognition_pipeline_tutorial.html

Speech Recognition with Wav2Vec2 — Torchaudio 2.9.0 documentation

We will use torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H here.

Clarifai

clarifai.com › facebook › asr › models › asr-wav2vec2-base-960h-english

asr-wav2vec2-base-960h-english model by facebook | Clarifai - The World's AI

Audio transcription model for converting English speech audio to English text

NVIDIA

catalog.ngc.nvidia.com › orgs › nvidia › teams › dle › models › wav2vec2_base_pyt_ckpt_finetune

GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC

GitHub

github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md

fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq

$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}

Author facebookresearch

Springer

link.springer.com › home › international journal of speech technology › article

Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology

October 10, 2024 - From this and the previous tables, we can safely conclude that models “ClTRUS”, “large” and “xlsr-53” all surpass the performance of the base wav2vec2 model.

Hugging Face

huggingface.co › nguyenvulebinh › wav2vec2-base-vietnamese-250h

nguyenvulebinh/wav2vec2-base-vietnamese-250h · Hugging Face

We use wav2vec2 architecture for the pre-trained model.

Hugging Face

huggingface.co › fav-kky › wav2vec2-base-sk-17k

fav-kky/wav2vec2-base-sk-17k · Hugging Face

It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.

Hugging Face

huggingface.co › imprt › izanami-wav2vec2-base

imprt/izanami-wav2vec2-base · Hugging Face

This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.

arXiv

arxiv.org › abs › 2006.11477

[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....