DKSplit
Word segmentation model for concatenated text. Split domain names, brand names, and phrases into words.
Current Version: 0.2.3
Model Description
- Architecture: BiLSTM-CRF (384 embedding, 768 hidden, 3 layers)
- Format: ONNX with INT8 quantization
- Size: ~9MB
- Input: Lowercase a-z, 0-9 (max 64 characters)
Usage
Install
pip install dksplit
Python
import dksplit
dksplit.split("chatgptlogin")
# ['chatgpt', 'login']
dksplit.split_batch(["openaikey", "microsoftoffice"])
# [['openai', 'key'], ['microsoft', 'office']]
Direct ONNX
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("dksplit-int8.onnx")
# See GitHub for full inference code
Files
dksplit-int8.onnx- ONNX model (INT8 quantized)dksplit.npz- CRF parameters
Limitations
- Input: a-z, 0-9 only
- Max length: 64 characters
- Non-Latin scripts: use Romanized form
Links
- Website: domainkits.com, ABTdomain.com
- GitHub: github.com/ABTdomain/dksplit
- PyPI: pypi.org/project/dksplit
License
Apache License 2.0 · Copyright 2026 ABTdomain
Please attribute as: DKsplit by ABTdomain