Efik - English (NLLB-200 Distilled)

Fine-tuned NLLB-200 model for translating Efik -> English. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "luel/nllb-200-distilled-600M-ft-efi-en"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="ibo_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)

input_example = "Ami nko nko."
inputs = tokenizer(input_example, return_tensors="pt")

generated_ids = model.generate(
    **inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("eng_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Training details (summary)

Item Value
Base model facebook/nllb-200-distilled-600M
Dataset Davlan/ibom-mt-en-efi
Script lafand-mt
Epochs 8
Effective batch size 32 (16 × 2 grad-accum)
Learning rate 3e-5
Mixed precision bf16
Early stopping Patience = 3, min_delta (BLEU) = 0.001

Evaluation

Metric efi->en
BLEU 38.6
chrF 54.5

Limitations

  • Using the Igbo token (ibo_Latn) as a stand-in for Efik may introduce lexical differences and tokenization mismatches.
  • The model has not been extensively evaluated for bias, toxicity, or gender neutrality.
Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for luel/nllb-200-distilled-600M-ft-efi-en

Finetuned
(214)
this model

Dataset used to train luel/nllb-200-distilled-600M-ft-efi-en

Collection including luel/nllb-200-distilled-600M-ft-efi-en

Evaluation results