๐ŸŒ
Hugging Face
huggingface.co โ€บ blog โ€บ fine-tune-wav2vec2-english
Fine-Tune Wav2Vec2 for English ASR in Hugging Face with ๐Ÿค— Transformers
Wav2Vec2 is fine-tuned using Connectionist Temporal Classification (CTC), which is an algorithm that is used to train neural networks for sequence-to-sequence problems and mainly in Automatic Speech Recognition and handwriting recognition.
๐ŸŒ
TensorFlow
tensorflow.org โ€บ hub โ€บ fine-tuning wav2vec2 with an lm head
Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling head (LM) over the top of our pre-trained model.
๐ŸŒ
AWS
aws.amazon.com โ€บ blogs โ€บ machine-learning โ€บ fine-tune-and-deploy-a-wav2vec2-model-for-speech-recognition-with-hugging-face-and-amazon-sagemaker
Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker | Artificial Intelligence
May 25, 2022 - This post shows how to use SageMaker to easily fine-tune the latest Wav2Vec2 model from Hugging Face, and then deploy the model with a custom-defined inference process to a SageMaker managed inference endpoint.
๐ŸŒ
Hugging Face
huggingface.co โ€บ blog โ€บ fine-tune-w2v2-bert
Fine-Tune W2V2-Bert for low-resource ASR with ๐Ÿค— Transformers
With as little as 10 minutes of labeled audio data, Wav2Vec2 could be fine-tuned to achieve 5% word-error rate performance on the LibriSpeech dataset, demonstrating for the first time low-resource transfer learning for ASR.
๐ŸŒ
GitHub
github.com โ€บ khanld โ€บ ASR-Wav2vec-Finetune
GitHub - khanld/ASR-Wav2vec-Finetune: :zap: Finetune Wa2vec 2.0 For Speech Recognition
:zap: Finetune Wa2vec 2.0 For Speech Recognition. Contribute to khanld/ASR-Wav2vec-Finetune development by creating an account on GitHub.
Starred by 142 users
Forked by 32 users
Languages ย  Python
๐ŸŒ
Hugging Face
huggingface.co โ€บ blog โ€บ fine-tune-xlsr-wav2vec2
Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with ๐Ÿค— Transformers
from transformers import AutoModelForCTC, Wav2Vec2Processor model = AutoModelForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xls-r-300m-tr-colab") processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xls-r-300m-tr-colab") For more examples of how XLS-R can be fine-tuned, please take a look at the official ๐Ÿค— Transformers examples.
๐ŸŒ
HackerNoon
hackernoon.com โ€บ working-with-wav2vec2-part-1-finetuning-xls-r-for-automatic-speech-recognition
Working With Wav2vec2 Part 1: Finetuning XLS-R for Automatic Speech Recognition | HackerNoon
May 4, 2024 - Step-by-step guide on how to finetune wav2vec2 XLS-R for automatic speech recognition using a Spanish language training dataset.
๐ŸŒ
Readthedocs
speechbrain.readthedocs.io โ€บ en โ€บ latest โ€บ tutorials โ€บ nn โ€บ using-wav2vec-2.0-hubert-wavlm-and-whisper-from-huggingface-with-speechbrain.html
Fine-tuning or using Whisper, wav2vec2, HuBERT and others with SpeechBrain and HuggingFace โ€” SpeechBrain 0.5.0 documentation
However, in the very specific context of pretrained models, SpeechBrain enables researchers and users to connect these architectures to state-of-the-art speech and audio-related technologies. For instance, SpeechBrain allows you to easily fine-tune a pretrained wav2vec2 model with a transformer decoder coupled with a beam search algorithm and a transformer language model to build a SOTA speech recognizer.
Find elsewhere
๐ŸŒ
arXiv
arxiv.org โ€บ abs โ€บ 2110.06309
[2110.06309] Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
February 21, 2023 - Abstract:While Wav2Vec 2.0 has ... using different fine-tuning strategies. Two baseline methods, vanilla fine-tuning (V-FT) and task adaptive pretraining (TAPT) are first presented....
๐ŸŒ
arXiv
arxiv.org โ€บ abs โ€บ 2109.15053
[2109.15053] Fine-tuning wav2vec2 for speaker recognition
May 6, 2022 - This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding.
๐ŸŒ
Kaggle
kaggle.com โ€บ code โ€บ alibahadorani โ€บ fine-tune-wav2vec2-model
fine-tune-wav2vec2-model
Checking your browser before accessing www.kaggle.com ยท Click here if you are not automatically redirected after 5 seconds
๐ŸŒ
GitHub
github.com โ€บ Hamtech-ai โ€บ wav2vec2-fa
GitHub - Hamtech-ai/wav2vec2-fa: fine-tune Wav2vec2. an ASR model released by Facebook
The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition.
Starred by 38 users
Forked by 5 users
Languages ย  Jupyter Notebook
๐ŸŒ
Medium
devblog.pytorchlightning.ai โ€บ fine-tuning-wav2vec-for-speech-recognition-with-lightning-flash-bf4b75cad99a
Fine-tuning Wav2Vec for Speech Recognition with Lightning Flash | by Sean Narenthiran | PyTorch Lightning Developer Blog
July 26, 2021 - Below we describe how you can use Lightning Flash for your own use cases and how to fine-tune Wav2Vec models on your own labeled transcription datasets, displaying the ease to configure the Lightning Trainer for fine-tuning and serving!
๐ŸŒ
IEEE Xplore
ieeexplore.ieee.org โ€บ document โ€บ 10508947
Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech | IEEE Journals & Magazine | IEEE Xplore
The performance of the wav2vec2 ... that the fine-tuned wav2vec2 features showed better generalization in both scenarios and gave an absolute accuracy improvement of 1.46%โ€“8.65% ......
๐ŸŒ
ISCA Archive
isca-archive.org โ€บ interspeech_2021 โ€บ peng21e_interspeech.html
ISCA Archive - A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis
Cite as: Peng, L., Fu, K., Lin, B., Ke, D., Zhan, J. (2021) A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis.
๐ŸŒ
CEUR-WS.org
ceur-ws.org โ€บ Vol-3175 โ€บ paper02.pdf pdf
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
2022_Challenge_Wav2vec2. In total, we conducted five main experiments to test ยท our methods: โ€ข Experiment 1: Wav2vec 2.0 XLSR-53 - Base: For ยท this experiment, the model was fine-tuned con- sidering the whole train set, but did not receive ยท neither normalization nor noise addition; โ€ข Experiment 2: Wav2vec 2.0 XLSR-53 - Norm: The ยท
๐ŸŒ
GitHub
github.com โ€บ b04901014 โ€บ FT-w2v2-ser
GitHub - b04901014/FT-w2v2-ser: Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition. Submitted to ICASSP 2022. pytorch ยท pytorch-lightning ยท fairseq (For Wav2vec) huggingface transformers (For Wav2vec2) faiss (For running clustering) Faiss can be skipped if you are not running clustering scripts.
Starred by 152 users
Forked by 34 users
Languages ย  Python 96.9% | Shell 2.4% | Dockerfile 0.7%