Sean Li's picture

Open to Work

Sean Li PRO

Hellohal2064

·

AI & ML interests

AI Infrastructure Engineer | Dual DGX Sparks (230GB VRAM) | 5-node Docker Swarm | Building AI Coworker systems

Recent Activity

posted an update 2 days ago

I have update the vllm to the latest 0.16rc1 at https://hub.docker.com/repository/docker/hellohal2064/vllm-dgx-spark-gb10/general it will run all of the qwen3 models very well with thinking at 41 tok/s it is only setup to run on one spark. I think the documentation on DockerHub is up to date.

posted an update 3 days ago

🚀 vLLM Docker Image for NVIDIA DGX Spark (GB10/SM121) Just released a pre-built vLLM Docker image optimized for DGX Spark's ARM64 + Blackwell SM121 GPU. **Why this exists:** Standard vLLM images don't support SM121 - you get "SM121 not supported" errors. This image includes patches for full GB10 compatibility. **What's included:** - vLLM 0.15.0 + SM121 patches - PyTorch 2.11 + CUDA 13.0 - ARM64 (aarch64) native - Pre-configured for FlashInfer attention **Verified models:** - Qwen3-Next-80B-A3B-FP8 (1M context!) - Qwen3-Embedding-8B (4096-dim embeddings) - Qwen3-VL-30B (vision) docker pull https://hub.docker.com/r/hellohal2064/vllm-dgx-spark-gb10

reacted to their post with 🔥 about 1 month ago

🚀 Excited to share: The vLLM container for NVIDIA DGX Spark! I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp! 📊 Performance Highlights: • Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp) • Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp) 🔧 Technical Challenges Solved: • Built PyTorch nightly with CUDA 13.1 + SM121 support • Patched vLLM for Blackwell architecture • Created custom MoE expert configs for GB10 • Implemented TRITON_ATTN backend workaround 📦 Available now: • Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest • HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10 The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!

View all activity

Organizations

Hellohal2064 's models 2

Hellohal2064/vllm-dgx-spark-gb10

Text Generation • Updated Jan 6

Hellohal2064/Hellohal2064