turn-detector-v2 - Turkish Turn Detection Model
This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.
Developed by SiriusAI Tech Brain Team
Mission
To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.
The turn-detector-v2 model analyzes conversational turn pairs (bot utterance + user response) and classifies whether the user's response requires LLM processing (agent_response) or is just a backchannel acknowledgment that can be handled without LLM (backchannel).
Key Benefits
| Benefit | Description |
|---|---|
| Latency Reduction | Skip LLM calls for backchannels, saving 500-2000ms per interaction |
| Cost Optimization | Reduce LLM API costs by filtering unnecessary calls |
| Natural Conversation | Return immediate filler responses ("hmm", "tamam") for acknowledgments |
| High Accuracy | 97.94% accuracy ensures reliable real-world performance |
Model Overview
| Property | Value |
|---|---|
| Architecture | BertForSequenceClassification |
| Base Model | dbmdz/bert-base-turkish-uncased |
| Task | Binary Text Classification |
| Language | Turkish (tr) |
| Labels | 2 (agent_response, backchannel) |
| Model Size | ~110M parameters |
| Inference Time | ~10-15ms (GPU) / ~40-50ms (CPU) |
Performance Metrics
Final Evaluation Results
| Metric | Score |
|---|---|
| Macro F1 | 0.9769 |
| Micro F1 | 0.9794 |
| MCC | 0.9544 |
| Accuracy | 97.94% |
Per-Class Performance
| Category | Accuracy | Samples |
|---|---|---|
| agent_response | 99.57% | 8,553 |
| backchannel | 94.83% | 4,470 |
Semantic Classification Rules
When to Classify as backchannel (Skip LLM)
| Condition | Examples |
|---|---|
| Bot gives info + User short acknowledgment | "tamam", "anladim", "ok", "peki" |
| Bot gives info + User rhetorical question | "oyle mi?", "harbi mi?", "cidden mi?" |
| Bot gives info + User minimal response | "hmm", "hi hi", "evet" |
When to Classify as agent_response (Send to LLM)
| Condition | Examples |
|---|---|
| Bot asks question + User gives any answer | "[bot] adi nedir [sep] [user] ahmet" |
| Bot gives info + User asks real question | "[bot] faturaniz kesildi [sep] [user] ne zaman?" |
| Bot gives info + User makes request | "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum" |
| User provides detailed information | "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..." |
Golden Rule
If bot asked a question โ Always agent_response
If bot gave info + User short acknowledgment โ backchannel
Dataset
Dataset Statistics
| Split | Samples |
|---|---|
| Train | 52,287 |
| Test | 13,023 |
| Total | 65,310 |
Label Distribution
| Label | Count | Percentage |
|---|---|---|
| agent_response | 35,223 | 67.4% |
| backchannel | 17,064 | 32.6% |
Domain Coverage
- E-commerce (kargo, iade, teslimat)
- Banking (hesap, bakiye, kredi)
- Telecom (numara tasima, data, hat)
- Insurance (prim, police, teminat, kasko)
- General Support (sikayet, yonetici, eskalasyon)
- Identity Verification (TC, gorusuyorum, soyadi)
Label Definitions
| Label | ID | Description |
|---|---|---|
| agent_response | 0 | User response requires LLM processing - questions, requests, confirmations to questions, corrections |
| backchannel | 1 | Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok) |
Input Format
[bot] <bot utterance> [sep] [user] <user response>
Example Classifications
agent_response (Send to LLM):
[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
[bot] kargonuz yolda [sep] [user] ne zaman gelir
[bot] poliรงeniz aktif [sep] [user] teminat limitini ogrenebilir miyim
backchannel (Skip LLM, return filler):
[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
[bot] kaydinizi kontrol ediyorum [sep] [user] peki
[bot] policeniz yenilendi [sep] [user] tesekkurler
[bot] sifreni sms ile gonderdik [sep] [user] ok aldim
Training
Hyperparameters
| Parameter | Value |
|---|---|
| Base Model | dbmdz/bert-base-turkish-uncased |
| Max Sequence Length | 128 tokens |
| Batch Size | 16 |
| Learning Rate | 3e-5 |
| Epochs | 4 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
| Loss Function | CrossEntropyLoss |
| Hardware | Apple Silicon (MPS) |
Usage
Installation
pip install transformers torch
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "hayatiali/turn-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
LABELS = ["agent_response", "backchannel"]
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}
# Bot asks question โ agent_response
print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
# Output: {'label': 'agent_response', 'confidence': 0.99}
# Bot gives info + User acknowledges โ backchannel
print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
# Output: {'label': 'backchannel', 'confidence': 0.98}
Production Integration
class TurnDetector:
"""Production-ready turn detection for voice assistants."""
LABELS = ["agent_response", "backchannel"]
FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlฤฑyorum"]
def __init__(self, model_path="hayatiali/turn-detector-v2"):
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device).eval()
def should_call_llm(self, bot_text: str, user_text: str) -> dict:
"""
Determines if user response should go to LLM.
Returns:
dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
"""
text = f"[bot] {bot_text} [sep] [user] {user_text}"
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
with torch.no_grad():
probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()
label_idx = probs.argmax().item()
label = self.LABELS[label_idx]
confidence = probs[label_idx].item()
result = {
"call_llm": label == "agent_response",
"label": label,
"confidence": confidence
}
if label == "backchannel":
import random
result["filler"] = random.choice(self.FILLER_RESPONSES)
return result
# Usage
detector = TurnDetector()
# Case 1: Bot asks, user confirms โ Send to LLM
result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}
# Case 2: Bot informs, user acknowledges โ Return filler
result = detector.should_call_llm("siparisiz yola cikti", "tamam")
# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}
Limitations
| Limitation | Details |
|---|---|
| Language | Turkish only, may struggle with heavy dialects |
| Context | Single-turn analysis, no multi-turn memory |
| Domain | Trained on customer service, may need fine-tuning for other domains |
| Edge Cases | Ambiguous short responses may have lower confidence |
Citation
@misc{turn-detector-v2-2025,
title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
author={SiriusAI Tech Brain Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}
Contact
- Developer: SiriusAI Tech Brain Team
- Email: info@siriusaitech.com
- Repository: GitHub
Changelog
v2.0 (Current)
Semantic Rule Improvements:
- If bot asks a question โ always
agent_response(731 rows corrected) - Rhetorical questions ("really?", "is that so?") โ remain as
backchannel - If user asks a real question ("when?", "how?") โ
agent_response
Dataset Expansion (+9,082 samples):
| Category | Added Patterns |
|---|---|
| Insurance | premium, policy, coverage, comprehensive, interest, late fees |
| Telecom | number porting, data exhausted, line transfer, GB remaining |
| E-commerce | shipping cost, free shipping, returns, delivery |
| Price/Budget | expensive, budget, too much, will think about it, not suitable |
| Identity Verification | national ID, "am I speaking with...", surname, date of birth |
| Objection/Complaint | unacceptable, not satisfied, complaint, impossible |
| Escalation | manager, director, supervisor |
| Hold Requests | one moment, busy right now, not now, later |
Metrics: Macro F1: 0.9769, Accuracy: 97.94%
Note: Metrics appear slightly lower than v1.0, but this is a more accurate model. v1.0 had mislabeled data (bot asked question + "yes" = backchannel), which the model memorized. v2.0 ensures semantic consistency.
v1.0
- Initial release
- Dataset: 56,228 samples
- Macro F1: 0.9924, Accuracy: 99.3%
License: SiriusAI Tech Premium License v1.0
Commercial Use: Requires Premium License. Contact: info@siriusaitech.com
- Downloads last month
- 44
Model tree for hayatiali/turn-detector-v2
Base model
dbmdz/bert-base-turkish-uncasedEvaluation results
- Macro F1self-reported0.977
- MCCself-reported0.954
- Accuracyself-reported97.940