turn-detector-v2 - Turkish Turn Detection Model

This model is designed for detecting turn-taking patterns in Turkish conversations, optimizing voice assistant latency by identifying when user utterances require LLM processing vs. simple acknowledgments.

Developed by SiriusAI Tech Brain Team

Mission

To optimize voice assistant response latency by detecting when user utterances require LLM processing vs. simple acknowledgments.

The turn-detector-v2 model analyzes conversational turn pairs (bot utterance + user response) and classifies whether the user's response requires LLM processing (agent_response) or is just a backchannel acknowledgment that can be handled without LLM (backchannel).

Key Benefits

Benefit	Description
Latency Reduction	Skip LLM calls for backchannels, saving 500-2000ms per interaction
Cost Optimization	Reduce LLM API costs by filtering unnecessary calls
Natural Conversation	Return immediate filler responses ("hmm", "tamam") for acknowledgments
High Accuracy	97.94% accuracy ensures reliable real-world performance

Model Overview

Property	Value
Architecture	BertForSequenceClassification
Base Model	`dbmdz/bert-base-turkish-uncased`
Task	Binary Text Classification
Language	Turkish (tr)
Labels	2 (agent_response, backchannel)
Model Size	~110M parameters
Inference Time	~10-15ms (GPU) / ~40-50ms (CPU)

Performance Metrics

Final Evaluation Results

Metric	Score
Macro F1	0.9769
Micro F1	0.9794
MCC	0.9544
Accuracy	97.94%

Per-Class Performance

Category	Accuracy	Samples
agent_response	99.57%	8,553
backchannel	94.83%	4,470

Semantic Classification Rules

When to Classify as `backchannel` (Skip LLM)

Condition	Examples
Bot gives info + User short acknowledgment	"tamam", "anladim", "ok", "peki"
Bot gives info + User rhetorical question	"oyle mi?", "harbi mi?", "cidden mi?"
Bot gives info + User minimal response	"hmm", "hi hi", "evet"

When to Classify as `agent_response` (Send to LLM)

Condition	Examples
Bot asks question + User gives any answer	"[bot] adi nedir [sep] [user] ahmet"
Bot gives info + User asks real question	"[bot] faturaniz kesildi [sep] [user] ne zaman?"
Bot gives info + User makes request	"[bot] kargonuz yolda [sep] [user] adresi degistirmek istiyorum"
User provides detailed information	"[bot] bilgi verir misiniz [sep] [user] sunu sunu istiyorum cunku..."

Golden Rule

If bot asked a question → Always agent_response
If bot gave info + User short acknowledgment → backchannel

Dataset

Dataset Statistics

Split	Samples
Train	52,287
Test	13,023
Total	65,310

Label Distribution

Label	Count	Percentage
agent_response	35,223	67.4%
backchannel	17,064	32.6%

Domain Coverage

E-commerce (kargo, iade, teslimat)
Banking (hesap, bakiye, kredi)
Telecom (numara tasima, data, hat)
Insurance (prim, police, teminat, kasko)
General Support (sikayet, yonetici, eskalasyon)
Identity Verification (TC, gorusuyorum, soyadi)

Label Definitions

Label	ID	Description
agent_response	0	User response requires LLM processing - questions, requests, confirmations to questions, corrections
backchannel	1	Simple acknowledgment - LLM skipped, filler returned (tamam, anladim, ok)

Input Format

[bot] <bot utterance> [sep] [user] <user response>

Example Classifications

agent_response (Send to LLM):

[bot] size nasil yardimci olabilirim [sep] [user] fatura sorgulamak istiyorum
[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim
[bot] islemi onayliyor musunuz [sep] [user] evet onayliyorum
[bot] kargonuz yolda [sep] [user] ne zaman gelir
[bot] poliçeniz aktif [sep] [user] teminat limitini ogrenebilir miyim

backchannel (Skip LLM, return filler):

[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam
[bot] siparisiniz 3 gun icinde teslim edilecek [sep] [user] anladim
[bot] kaydinizi kontrol ediyorum [sep] [user] peki
[bot] policeniz yenilendi [sep] [user] tesekkurler
[bot] sifreni sms ile gonderdik [sep] [user] ok aldim

Training

Hyperparameters

Parameter	Value
Base Model	`dbmdz/bert-base-turkish-uncased`
Max Sequence Length	128 tokens
Batch Size	16
Learning Rate	3e-5
Epochs	4
Optimizer	AdamW
Weight Decay	0.01
Loss Function	CrossEntropyLoss
Hardware	Apple Silicon (MPS)

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "hayatiali/turn-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

LABELS = ["agent_response", "backchannel"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    return {"label": max(scores, key=scores.get), "confidence": max(scores.values())}

# Bot asks question → agent_response
print(predict("[bot] ahmet bey ile mi gorusuyorum [sep] [user] evet benim"))
# Output: {'label': 'agent_response', 'confidence': 0.99}

# Bot gives info + User acknowledges → backchannel
print(predict("[bot] faturaniz 150 tl gorunuyor [sep] [user] tamam"))
# Output: {'label': 'backchannel', 'confidence': 0.98}

Production Integration

class TurnDetector:
    """Production-ready turn detection for voice assistants."""

    LABELS = ["agent_response", "backchannel"]
    FILLER_RESPONSES = ["hmm", "evet", "tamam", "anlıyorum"]

    def __init__(self, model_path="hayatiali/turn-detector-v2"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device).eval()

    def should_call_llm(self, bot_text: str, user_text: str) -> dict:
        """
        Determines if user response should go to LLM.

        Returns:
            dict with 'call_llm' (bool), 'label', 'confidence', 'filler' (if backchannel)
        """
        text = f"[bot] {bot_text} [sep] [user] {user_text}"
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.softmax(self.model(**inputs).logits, dim=-1)[0].cpu()

        label_idx = probs.argmax().item()
        label = self.LABELS[label_idx]
        confidence = probs[label_idx].item()

        result = {
            "call_llm": label == "agent_response",
            "label": label,
            "confidence": confidence
        }

        if label == "backchannel":
            import random
            result["filler"] = random.choice(self.FILLER_RESPONSES)

        return result

# Usage
detector = TurnDetector()

# Case 1: Bot asks, user confirms → Send to LLM
result = detector.should_call_llm("siparis iptal etmek ister misiniz", "evet iptal et")
# {'call_llm': True, 'label': 'agent_response', 'confidence': 0.99}

# Case 2: Bot informs, user acknowledges → Return filler
result = detector.should_call_llm("siparisiz yola cikti", "tamam")
# {'call_llm': False, 'label': 'backchannel', 'confidence': 0.97, 'filler': 'hmm'}

Limitations

Limitation	Details
Language	Turkish only, may struggle with heavy dialects
Context	Single-turn analysis, no multi-turn memory
Domain	Trained on customer service, may need fine-tuning for other domains
Edge Cases	Ambiguous short responses may have lower confidence

Citation

@misc{turn-detector-v2-2025,
  title={turn-detector-v2: Turkish Turn Detection for Voice Assistants},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/turn-detector-v2}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}

Contact

Developer: SiriusAI Tech Brain Team
Email: info@siriusaitech.com
Repository: GitHub

Changelog

v2.0 (Current)

Semantic Rule Improvements:

If bot asks a question → always agent_response (731 rows corrected)
Rhetorical questions ("really?", "is that so?") → remain as backchannel
If user asks a real question ("when?", "how?") → agent_response

Dataset Expansion (+9,082 samples):

Category	Added Patterns
Insurance	premium, policy, coverage, comprehensive, interest, late fees
Telecom	number porting, data exhausted, line transfer, GB remaining
E-commerce	shipping cost, free shipping, returns, delivery
Price/Budget	expensive, budget, too much, will think about it, not suitable
Identity Verification	national ID, "am I speaking with...", surname, date of birth
Objection/Complaint	unacceptable, not satisfied, complaint, impossible
Escalation	manager, director, supervisor
Hold Requests	one moment, busy right now, not now, later

Metrics: Macro F1: 0.9769, Accuracy: 97.94%

Note: Metrics appear slightly lower than v1.0, but this is a more accurate model. v1.0 had mislabeled data (bot asked question + "yes" = backchannel), which the model memorized. v2.0 ensures semantic consistency.

v1.0

Initial release
Dataset: 56,228 samples
Macro F1: 0.9924, Accuracy: 99.3%

License: SiriusAI Tech Premium License v1.0

Commercial Use: Requires Premium License. Contact: info@siriusaitech.com

Downloads last month: 44

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for hayatiali/turn-detector-v2

Base model

dbmdz/bert-base-turkish-uncased

Finetuned

(34)

this model

Evaluation results

Macro F1
self-reported

0.977
MCC
self-reported

0.954
Accuracy
self-reported

97.940