Whisper Large V3 Fine-tuned on Nepali (OpenSLR 54)

This is a State-of-the-Art Nepali ASR model, fine-tuned on the OpenSLR 54 dataset (~154 hours). It utilizes the massive whisper-large-v3 architecture (1.55 Billion parameters) to achieve high accuracy on complex vocabulary, numbers, and dates.

Model Details

  • Model: Whisper Large V3 (1.55B Parameters)
  • Dataset: 157,000 Nepali Audio Utterances (154 Hours)
  • Language: Nepali
  • Fine-tuning Hardware: NVIDIA A100 80GB

Metrics

  • Final WER: 22.31%
  • Validation Loss: 0.0927

Note: While the raw WER is higher than the Medium model, the Large model demonstrates superior handling of numbers, dates, and English loanwords.

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="Dragneel/whisper-large-v3-nepali-openslr", device="cuda")

# Transcribe
transcription = transcriber("path_to_nepali_audio.mp3")
print(transcription["text"])

This research was supported by the High Performance Computing (HPC) facility at Tribhuvan University, Nepal. We acknowledge the Supercomputer Centre for providing the computational resources required for this wor

Downloads last month
26
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dragneel/whisper-large-v3-nepali-openslr

Finetuned
(667)
this model

Dataset used to train Dragneel/whisper-large-v3-nepali-openslr

Evaluation results