Efik - English (NLLB-200 Distilled)

Fine-tuned NLLB-200 model for translating Efik -> English. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "luel/nllb-200-distilled-600M-ft-efi-en"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="ibo_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)

input_example = "Ami nko nko."
inputs = tokenizer(input_example, return_tensors="pt")

generated_ids = model.generate(
    **inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("eng_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Training details (summary)

Item	Value
Base model	facebook/nllb-200-distilled-600M
Dataset	Davlan/ibom-mt-en-efi
Script	lafand-mt
Epochs	8
Effective batch size	32 (16 × 2 grad-accum)
Learning rate	3e-5
Mixed precision	bf16
Early stopping	Patience = 3, min_delta (BLEU) = 0.001

Evaluation

Metric	efi->en
BLEU	38.6
chrF	54.5

Limitations

Using the Igbo token (ibo_Latn) as a stand-in for Efik may introduce lexical differences and tokenization mismatches.
The model has not been extensively evaluated for bias, toxicity, or gender neutrality.

Downloads last month: 4

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for luel/nllb-200-distilled-600M-ft-efi-en

Base model

facebook/nllb-200-distilled-600M

Finetuned

(214)

this model

Dataset used to train luel/nllb-200-distilled-600M-ft-efi-en

Collection including luel/nllb-200-distilled-600M-ft-efi-en

MT

Collection

Machine Translation • 2 items • Updated Jul 29

Evaluation results

BLEU on Ibom-MT (en-efi)
self-reported

38.600
chrF on Ibom-MT (en-efi)
self-reported

54.500