view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 β’ 233
view post Post 1723 Mini-QwQ an edge device friendly reasoning model distilled from QwQ-32B π€: kz919/QwQ-0.5B-Distilled-SFTπ¬ π¬ πΊ π«: kz919/QwQ-0.5B-Distilled-SFT-ggufπ€: kz919/Mini-QwQ See translation π 7 7 + Reply
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 16 items β’ Updated Dec 24, 2025 β’ 244