MT
Collection
Machine Translation
•
2 items
•
Updated
Fine-tuned NLLB-200 model for translating Efik -> English. Efik is not directly supported in NLLB, we use the Igbo language code ibo_Latn as a close proxy during training and inference.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "luel/nllb-200-distilled-600M-ft-efi-en"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True, src_lang="ibo_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_auth_token=True)
input_example = "Ami nko nko."
inputs = tokenizer(input_example, return_tensors="pt")
generated_ids = model.generate(
**inputs, forced_bos_token_id = tokenizer.convert_tokens_to_ids("eng_Latn"), max_length=30
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
| Item | Value |
|---|---|
| Base model | facebook/nllb-200-distilled-600M |
| Dataset | Davlan/ibom-mt-en-efi |
| Script | lafand-mt |
| Epochs | 8 |
| Effective batch size | 32 (16 × 2 grad-accum) |
| Learning rate | 3e-5 |
| Mixed precision | bf16 |
| Early stopping | Patience = 3, min_delta (BLEU) = 0.001 |
| Metric | efi->en |
|---|---|
| BLEU | 38.6 |
| chrF | 54.5 |
ibo_Latn) as a stand-in for Efik may introduce lexical differences and tokenization mismatches. Base model
facebook/nllb-200-distilled-600M