Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Work
103.9
TFLOPS
Sean Li
PRO
Hellohal2064
Follow
souhaiebtar's profile picture
branikita's profile picture
Fishtiks's profile picture
6 followers
ยท
1 following
Hellohal2064
seanli3
AI & ML interests
AI Infrastructure Engineer | Dual DGX Sparks (230GB VRAM) | 5-node Docker Swarm | Building AI Coworker systems
Recent Activity
posted
an
update
2 days ago
I have update the vllm to the latest 0.16rc1 at https://hub.docker.com/repository/docker/hellohal2064/vllm-dgx-spark-gb10/general it will run all of the qwen3 models very well with thinking at 41 tok/s it is only setup to run on one spark. I think the documentation on DockerHub is up to date.
posted
an
update
3 days ago
๐ vLLM Docker Image for NVIDIA DGX Spark (GB10/SM121) Just released a pre-built vLLM Docker image optimized for DGX Spark's ARM64 + Blackwell SM121 GPU. **Why this exists:** Standard vLLM images don't support SM121 - you get "SM121 not supported" errors. This image includes patches for full GB10 compatibility. **What's included:** - vLLM 0.15.0 + SM121 patches - PyTorch 2.11 + CUDA 13.0 - ARM64 (aarch64) native - Pre-configured for FlashInfer attention **Verified models:** - Qwen3-Next-80B-A3B-FP8 (1M context!) - Qwen3-Embedding-8B (4096-dim embeddings) - Qwen3-VL-30B (vision) docker pull https://hub.docker.com/r/hellohal2064/vllm-dgx-spark-gb10
reacted
to
their
post
with ๐ฅ
about 1 month ago
๐ Excited to share: The vLLM container for NVIDIA DGX Spark! I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp! ๐ Performance Highlights: โข Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp) โข Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp) ๐ง Technical Challenges Solved: โข Built PyTorch nightly with CUDA 13.1 + SM121 support โข Patched vLLM for Blackwell architecture โข Created custom MoE expert configs for GB10 โข Implemented TRITON_ATTN backend workaround ๐ฆ Available now: โข Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest โข HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10 The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!
View all activity
Organizations
Hellohal2064
's models
2
Sort:ย Recently updated
Hellohal2064/vllm-dgx-spark-gb10
Text Generation
โข
Updated
Jan 6
Hellohal2064/Hellohal2064
Updated
Jan 5