YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SimpleTool: Parallel Decoding for Real-Time LLM Function Calling

Hugging Face | ModelScope | GitHub

This repository contains the weights for RT-Qwen (RealtimeTool), a series of models optimized for low-latency, parallel LLM function calling.

πŸ“ Model Directory Structure

The models are organized by scale, quantization format, and inference framework.

1. SFT & AWQ Models (vLLM / Transformers)

Directly use these folders for inference via vLLM or Transformers.

  • RT-Qwen2.5-0.5B / -0.5B-AWQ
  • RT-Qwen2.5-1.5B / -1.5B-AWQ
  • RT-Qwen2.5-3B / -3B-AWQ
  • RT-Qwen2.5-7B / -7B-AWQ
  • RT-Qwen2.5-14B / -14B-AWQ
  • RT-Qwen3-4B / -4B-AWQ
  • RT-Qwen3-30B / -30B-AWQ

2. GGUF Models (llama.cpp)

  • gguf_models/: Full-precision (F16) GGUF files for all versions.
  • gguf_quantized/: Quantized GGUF versions including Q4_K_M, Q5_K_M, and Q8_0.

πŸ“ TODO

  • Release Arxiv Paper
  • Complete GitHub Documentation
  • Add Performance Benchmarks
  • Provide Citation Info

License: Apache-2.0
Status: Models Uploading / Placeholder README

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support