Brave Search

Facebook's Wav2Vec2 · The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note: This model does not have a tokenizer as it was pretrained on audio alone.

Hugging Face

huggingface.co › facebook › wav2vec2-base-960h

facebook/wav2vec2-base-960h · Hugging Face

The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio.

Videos

youtube.com

Deploy Wav2Vec2.0 based Speech Recognition Service in ...

19:08

YouTube

wav2vec 2.0 | Lecture 76 (Part 3) | Applied Deep Learning - YouTube

May 7, 2021

07:31

YouTube

Episode1: This week in AI | Amazon self driving , Wav2Vec2.0 and ...

June 28, 2020

45:11

YouTube

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech ...

June 24, 2020

View all

GitHub

github.com › openvinotoolkit › open_model_zoo › blob › master › models › public › wav2vec2-base › README.md

open_model_zoo/models/public/wav2vec2-base/README.md at master · openvinotoolkit/open_model_zoo

Wav2Vec2.0-base is a model, which pre-trained to learn speech representations on unlabeled data as described in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations paper and fine-tuned for speech recognition task ...

Author openvinotoolkit

Dataloop

dataloop.ai › home › library › models › wav2vec2 base 960h

Wav2vec2 Base 960h · Models · Dataloop

The Wav2vec2 Base 960h model is a powerful tool for speech recognition. What makes it unique is its ability to learn from speech audio alone and fine-tune on transcribed speech, achieving state-of-the-art results with limited labeled data.

AI Models

aimodels.fyi › models › huggingFace › wav2vec2-base-960h-facebook

wav2vec2-base-960h | AI Model Details

wav2vec2-base-960h is a pre-trained speech recognition model developed by Facebook. It is based on the Wav2Vec2 architecture and was trained on 960 hours of LibriSpeech data.

Hugging Face

huggingface.co › rinna › japanese-wav2vec2-base

rinna/japanese-wav2vec2-base · Hugging Face

wav2vec2 · pretraining · speech · arxiv: 2404.01657 · License: apache-2.0 · Model card Files Files and versions · xet Community · 2 · Train · Deploy · Use this model · This is a Japanese wav2vec 2.0 Base model trained by rinna Co., Ltd.

PyTorch

docs.pytorch.org › audio › 2.8 › generated › torchaudio.pipelines.WAV2VEC2_BASE.html

WAV2VEC2_BASE — Torchaudio 2.8.0 documentation

Wav2vec 2.0 model (“base” architecture), pre-trained on 960 hours of unlabeled audio from LibriSpeech dataset [Panayotov et al., 2015] (the combination of “train-clean-100”, “train-clean-360”, and “train-other-500”), not fine-tuned. Originally published by the authors of wav2vec 2.0 [Baevski et al., 2020] under MIT License and redistributed with the same license. [License, Source] Please refer to torchaudio.pipelines.Wav2Vec2Bundle ...

Metatext

metatext.io › models › facebook-wav2vec2-base

facebook/wav2vec2-base Model - NLP Hub

Metatext is a platform that allows you to build, train and deploy NLP models in minutes.

Clarifai

clarifai.com › facebook › asr › models › asr-wav2vec2-base-960h-english

asr-wav2vec2-base-960h-english model by facebook | Clarifai - The World's AI

Audio transcription model for converting English speech audio to English text

Find elsewhere

Google Bing Mojeek

Hugging Face

huggingface.co › docs › transformers › en › model_doc › wav2vec2

Wav2Vec2

Base class for models that have been trained with the Wav2Vec2 loss objective.

PyTorch

docs.pytorch.org › audio › stable › tutorials › speech_recognition_pipeline_tutorial.html

Speech Recognition with Wav2Vec2 — Torchaudio 2.9.0 documentation

We will use torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H here.

NVIDIA

catalog.ngc.nvidia.com › orgs › nvidia › teams › dle › models › wav2vec2_base_pyt_ckpt_finetune

GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC

GitHub

github.com › facebookresearch › fairseq › blob › main › examples › wav2vec › README.md

fairseq/examples/wav2vec/README.md at main · facebookresearch/fairseq

$ fairseq-hydra-train \ task.data=/path/to/data \ --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \ --config-name wav2vec2_conformer_base_librispeech \ --attn-type espnet --pos-enc-type ${POS_ENC_TYPE}

Author facebookresearch

Springer

link.springer.com › home › international journal of speech technology › article

Comparison of wav2vec 2.0 models on three speech processing tasks | International Journal of Speech Technology

October 10, 2024 - From this and the previous tables, we can safely conclude that models “ClTRUS”, “large” and “xlsr-53” all surpass the performance of the base wav2vec2 model.

Hugging Face

huggingface.co › nguyenvulebinh › wav2vec2-base-vietnamese-250h

nguyenvulebinh/wav2vec2-base-vietnamese-250h · Hugging Face

We use wav2vec2 architecture for the pre-trained model.

Hugging Face

huggingface.co › imprt › izanami-wav2vec2-base

imprt/izanami-wav2vec2-base · Hugging Face

This is a Japanese wav2vec2.0 Base model pre-trained using 5313 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.

Hugging Face

huggingface.co › fav-kky › wav2vec2-base-sk-17k

fav-kky/wav2vec2-base-sk-17k · Hugging Face

It was introduced in the paper Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak accepted for the TSD2023 conference. This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data. The model was initialized from the Czech pre-trained model fav-kky/wav2vec2-base-cs-80k-ClTRUS.

arXiv

arxiv.org › abs › 2006.11477

[2006.11477] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

October 22, 2020 - We show for the first time that ... wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned....

AWS

aws.amazon.com › blogs › machine-learning › fine-tune-and-deploy-a-wav2vec2-model-for-speech-recognition-with-hugging-face-and-amazon-sagemaker

Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker | Artificial Intelligence

May 25, 2022 - Then the model is fine-tuned on labeled data with the Connectionist Temporal Classification (CTC) algorithm for specific ASR tasks. The base model we use in this post is Wav2Vec2-Base-960h, fine-tuned on 960 hours of Librispeech on 16 kHz sampled speech audio.

Mohitmayank

mohitmayank.com › a_lazy_data_science_guide › audio_intelligence › wav2vec2

Wav2Vec2 Model - A Lazy Data Science Guide

First, we will select one text dataset. This dataset can be the transcript of train data (part of labeled data we used to finetune Wav2Vec2 model) or a related (same domain like medical, telecom, etc) collection of documents.