Hugging Face Party @ PyTorch Conference

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

kenobi authored a paper 22 days ago

On Invariance Penalties for Risk Minimization

kenobi authored a paper 22 days ago

Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization

kenobi authored a paper 22 days ago

Bayesian Deep Learning for Exoplanet Atmospheric Retrieval

View all activity

csabakecskemeti

posted an update 12 days ago

Post

3142

Just sharing a result of a homelab infrastructure experiment:

I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com

Screen recording:
https://lnkd.in/gKM9H5GJ

3 replies

csabakecskemeti

posted an update about 1 month ago

Post

1265

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

csabakecskemeti

posted an update about 1 month ago

Post

2075

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!

8 replies

csabakecskemeti

posted an update about 2 months ago

Post

304

Recently there are so much activity on token efficient formats, I've also build a package (inspired by toon).

Deep-TOON

My goal was to token efficiently handle json structures with complex embeddings.

So this is what I've built on the weekend. Feel free try:

https://pypi.org/project/deep-toon/0.1.0/

JingzeShi

posted an update about 2 months ago

Post

245

Is it time to start developing sparse attention again?
https://github.com/SmallDoges/flash-sparse-attention

xianbao

authored a paper 2 months ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

Paper • 2510.17950 • Published Oct 20, 2025 • 7

csabakecskemeti

posted an update 3 months ago

Post

2624

Christmas came early this year

3 replies

mbrack

authored a paper 3 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14, 2025 • 18

osanseviero

authored a paper 3 months ago

EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 42

lysandre

posted an update 4 months ago

Post

7469

We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!

6 replies

jeffboudier

posted an update 4 months ago

Post

3106

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

Xenova

posted an update 5 months ago

Post

14279

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!

3 replies

1024m

authored 2 papers 5 months ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published Aug 6, 2025

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published Jul 31, 2025 • 2

JingzeShi

authored a paper 5 months ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4, 2025 • 17

Xenova

posted an update 5 months ago

Post

4632

The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀

2 replies

JingzeShi

posted an update 5 months ago

Post

4077

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

ybelkada

authored 2 papers 5 months ago

NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models

Paper • 2506.07731 • Published Jun 9, 2025 • 2

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30, 2025 • 68

Xenova

posted an update 6 months ago

Post

3491

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU

AI & ML interests

Recent Activity

Team members 193

HF-Party's activity