Dragneel's picture
Update README.md
e621bbb verified
metadata
language:
  - ne
tags:
  - whisper
  - automatic-speech-recognition
  - speech
  - generated_from_trainer
  - openslr
license: apache-2.0
datasets:
  - openslr
metrics:
  - wer
base_model:
  - openai/whisper-large-v3
model-index:
  - name: Whisper Large V3 Nepali (OpenSLR)
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: OpenSLR 54 (Nepali Speech Corpus)
          type: openslr
        metrics:
          - name: Wer
            type: wer
            value: 22.31

Whisper Large V3 Fine-tuned on Nepali (OpenSLR 54)

This is a State-of-the-Art Nepali ASR model, fine-tuned on the OpenSLR 54 dataset (~154 hours). It utilizes the massive whisper-large-v3 architecture (1.55 Billion parameters) to achieve high accuracy on complex vocabulary, numbers, and dates.

Model Details

  • Model: Whisper Large V3 (1.55B Parameters)
  • Dataset: 157,000 Nepali Audio Utterances (154 Hours)
  • Language: Nepali
  • Fine-tuning Hardware: NVIDIA A100 80GB

Metrics

  • Final WER: 22.31%
  • Validation Loss: 0.0927

Note: While the raw WER is higher than the Medium model, the Large model demonstrates superior handling of numbers, dates, and English loanwords.

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="Dragneel/whisper-large-v3-nepali-openslr", device="cuda")

# Transcribe
transcription = transcriber("path_to_nepali_audio.mp3")
print(transcription["text"])

This research was supported by the High Performance Computing (HPC) facility at Tribhuvan University, Nepal. We acknowledge the Supercomputer Centre for providing the computational resources required for this wor