turn-detector-v2 - Turkish Turn Detection Model

Hugging Face Production Ready Turkish Turn Detection F1 Score

This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.

Developed by SiriusAI Tech Brain Team


Mission

To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.

The turn-detector-v2 model analyzes conversational turn pairs (bot utterance + user response) and classifies whether the user's response requires LLM processing (agent_response) or is just a backchannel acknowledgment that can be handled without LLM (backchannel).

Key Benefits

Benefit Description
Latency Reduction Skip LLM calls for backchannels, saving 500-2000ms per interaction
Cost Optimization Reduce LLM API costs by filtering unnecessary calls
Natural Conversation Return immediate filler responses ("hmm", "tamam") for acknowledgments
High Accuracy 97.94% accuracy ensures reliable real-world performance

Model Overview

Property Value
Architecture BertForSequenceClassification
Base Model dbmdz/bert-base-turkish-uncased
Task Binary Text Classification
Language Turkish (tr)
Labels 2 (agent_response, backchannel)
Model Size ~110M parameters
Inference Time ~10-15ms (GPU) / ~40-50ms (CPU)

Performance Metrics

Final Evaluation Results

Metric Score
Macro F1 0.9769
Micro F1 0.9794
MCC 0.9544
Accuracy 97.94%

Per-Class Performance

Category Accuracy Samples
agent_response 99.57% 8,553
backchannel 94.83% 4,470

Semantic Classification Rules

When to Classify as backchannel (Skip LLM)

Condition Examples
Bot gives info + User short acknowledgment "tamam", "anladim", "ok", "peki"
Bot gives info + User rhetorical question "oyle mi?", "harbi mi?", "cidden mi?"
Bot gives info + User minimal response "hmm", "hi hi", "evet"

When to Classify as agent_response (Send to LLM)

Condition Examples
Bot asks question + User gives any answer "[bot] adi nedir [sep] [user] ahmet"
Bot gives info + User asks real question "[bot] faturaniz kesildi [sep] [user] ne zaman?"
Bot gives info + User makes request "[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum"
User provides detailed information "[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..."

Golden Rule

If bot asked a question โ†’ Always agent_response
If bot gave info + User short acknowledgment โ†’ backchannel

Dataset

Dataset Statistics

Split Samples
Train 52,287
Test 13,023
Total 65,310

Label Distribution

Label Count Percentage
agent_response 35,223 67.4%
backchannel 17,064 32.6%

Domain Coverage

  • E-commerce (kargo, iade, teslimat)
  • Banking (hesap, bakiye, kredi)
  • Telecom (numara tasima, data, hat)
  • Insurance (prim, police, teminat, kasko)
  • General Support (sikayet, yonetici, eskalasyon)
  • Identity Verification (TC, gorusuyorum, soyadi)

Label Definitions

Label ID Description
agent_response 0 User response requires LLM processing - questions, requests, confirmations to questions, corrections
backchannel 1 Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok)

Input Format

[bot] <bot utterance> [sep] [user] <user response>

Example Classifications

agent_response (Send to LLM):

[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
[bot] kargonuz yolda [sep] [user] ne zaman gelir
[bot] poliรงeniz aktif [sep] [user] teminat limitini ogrenebilir miyim

backchannel (Skip LLM, return filler):

[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
[bot] kaydinizi kontrol ediyorum [sep] [user] peki
[bot] policeniz yenilendi [sep] [user] tesekkurler
[bot] sifreni sms ile gonderdik [sep] [user] ok aldim

Training

Hyperparameters

Parameter Value
Base Model dbmdz/bert-base-turkish-uncased
Max Sequence Length 128 tokens
Batch Size 16
Learning Rate 3e-5
Epochs 4
Optimizer AdamW
Weight Decay 0.01
Loss Function CrossEntropyLoss
Hardware Apple Silicon (MPS)

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "hayatiali/turn-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

LABELS = ["agent_response", "backchannel"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}

# Bot asks question โ†’ agent_response
print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
# Output: {'label': 'agent_response', 'confidence': 0.99}

# Bot gives info + User acknowledges โ†’ backchannel
print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
# Output: {'label': 'backchannel', 'confidence': 0.98}

Production Integration

class TurnDetector:
    """Production-ready turn detection for voice assistants."""

    LABELS = ["agent_response", "backchannel"]
    FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlฤฑyorum"]

    def __init__(self, model_path="hayatiali/turn-detector-v2"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device).eval()

    def should_call_llm(self, bot_text: str, user_text: str) -> dict:
        """
        Determines if user response should go to LLM.

        Returns:
            dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
        """
        text = f"[bot] {bot_text} [sep] [user] {user_text}"
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()

        label_idx = probs.argmax().item()
        label = self.LABELS[label_idx]
        confidence = probs[label_idx].item()

        result = {
            "call_llm": label == "agent_response",
            "label": label,
            "confidence": confidence
        }

        if label == "backchannel":
            import random
            result["filler"] = random.choice(self.FILLER_RESPONSES)

        return result

# Usage
detector = TurnDetector()

# Case 1: Bot asks, user confirms โ†’ Send to LLM
result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}

# Case 2: Bot informs, user acknowledges โ†’ Return filler
result = detector.should_call_llm("siparisiz yola cikti", "tamam")
# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}

Limitations

Limitation Details
Language Turkish only, may struggle with heavy dialects
Context Single-turn analysis, no multi-turn memory
Domain Trained on customer service, may need fine-tuning for other domains
Edge Cases Ambiguous short responses may have lower confidence

Citation

@misc{turn-detector-v2-2025,
  title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}

Contact


Changelog

v2.0 (Current)

Semantic Rule Improvements:

  • If bot asks a question โ†’ always agent_response (731 rows corrected)
  • Rhetorical questions ("really?", "is that so?") โ†’ remain as backchannel
  • If user asks a real question ("when?", "how?") โ†’ agent_response

Dataset Expansion (+9,082 samples):

Category Added Patterns
Insurance premium, policy, coverage, comprehensive, interest, late fees
Telecom number porting, data exhausted, line transfer, GB remaining
E-commerce shipping cost, free shipping, returns, delivery
Price/Budget expensive, budget, too much, will think about it, not suitable
Identity Verification national ID, "am I speaking with...", surname, date of birth
Objection/Complaint unacceptable, not satisfied, complaint, impossible
Escalation manager, director, supervisor
Hold Requests one moment, busy right now, not now, later

Metrics: Macro F1: 0.9769, Accuracy: 97.94%

Note: Metrics appear slightly lower than v1.0, but this is a more accurate model. v1.0 had mislabeled data (bot asked question + "yes" = backchannel), which the model memorized. v2.0 ensures semantic consistency.

v1.0

  • Initial release
  • Dataset: 56,228 samples
  • Macro F1: 0.9924, Accuracy: 99.3%

License: SiriusAI Tech Premium License v1.0

Commercial Use: Requires Premium License. Contact: info@siriusaitech.com

Downloads last month
44
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hayatiali/turn-detector-v2

Finetuned
(34)
this model

Evaluation results