YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
SimpleTool: Parallel Decoding for Real-Time LLM Function Calling
Hugging Face | ModelScope | GitHub
This repository contains the weights for RT-Qwen (RealtimeTool), a series of models optimized for low-latency, parallel LLM function calling.
π Model Directory Structure
The models are organized by scale, quantization format, and inference framework.
1. SFT & AWQ Models (vLLM / Transformers)
Directly use these folders for inference via vLLM or Transformers.
- RT-Qwen2.5-0.5B / -0.5B-AWQ
- RT-Qwen2.5-1.5B / -1.5B-AWQ
- RT-Qwen2.5-3B / -3B-AWQ
- RT-Qwen2.5-7B / -7B-AWQ
- RT-Qwen2.5-14B / -14B-AWQ
- RT-Qwen3-4B / -4B-AWQ
- RT-Qwen3-30B / -30B-AWQ
2. GGUF Models (llama.cpp)
gguf_models/: Full-precision (F16) GGUF files for all versions.gguf_quantized/: Quantized GGUF versions including Q4_K_M, Q5_K_M, and Q8_0.
π TODO
- Release Arxiv Paper
- Complete GitHub Documentation
- Add Performance Benchmarks
- Provide Citation Info
License: Apache-2.0
Status: Models Uploading / Placeholder README
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support