Whisper Large V3 Fine-tuned on Nepali (OpenSLR 54)
This is a State-of-the-Art Nepali ASR model, fine-tuned on the OpenSLR 54 dataset (~154 hours).
It utilizes the massive whisper-large-v3 architecture (1.55 Billion parameters) to achieve high accuracy on complex vocabulary, numbers, and dates.
Model Details
- Model: Whisper Large V3 (1.55B Parameters)
- Dataset:
157,000 Nepali Audio Utterances (154 Hours) - Language: Nepali
- Fine-tuning Hardware: NVIDIA A100 80GB
Metrics
- Final WER: 22.31%
- Validation Loss: 0.0927
Note: While the raw WER is higher than the Medium model, the Large model demonstrates superior handling of numbers, dates, and English loanwords.
Usage
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="Dragneel/whisper-large-v3-nepali-openslr", device="cuda")
# Transcribe
transcription = transcriber("path_to_nepali_audio.mp3")
print(transcription["text"])
This research was supported by the High Performance Computing (HPC) facility at Tribhuvan University, Nepal. We acknowledge the Supercomputer Centre for providing the computational resources required for this wor
- Downloads last month
- 26
Model tree for Dragneel/whisper-large-v3-nepali-openslr
Base model
openai/whisper-large-v3Dataset used to train Dragneel/whisper-large-v3-nepali-openslr
Evaluation results
- Wer on OpenSLR 54 (Nepali Speech Corpus)self-reported22.310