HF-Party (Hugging Face Party @ PyTorch Conference)

posted an update 9 days ago

Post

3128

Just sharing a result of a homelab infrastructure experiment:

I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com

Screen recording:
https://lnkd.in/gKM9H5GJ

3 replies

·

Johannes

authored a paper 10 days ago

INTELLECT-3: Technical Report

Paper • 2512.16144 • Published 18 days ago • 16

Aurelien-Morgan

posted an update 26 days ago

Post

318

Hey, I went to Hangzhou to talk about retrain-pipelines at the GOSIM Foundation's conference last september.
The recording just got released. Go check it out !
https://www.youtube.com/watch?v=nmrMachM5aM
Slides are there :
https://docs.google.com/presentation/d/1hnAzHJ0SbeAOtGJir-iH84RBtXT1OxVT/

2 replies

·

csabakecskemeti

posted an update about 1 month ago

Post

1259

FYI: Mistral.Ministral-3 dequantizer FP8->BF16

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

(The instruct model weights are in FP8)

csabakecskemeti

posted an update about 1 month ago

Post

2073

Looking for some help to test an INT8 Deepseek 3.2:
SGLang supports Channel wise INT8 quants on CPUs with AMX instructions (Xeon 5 and above AFAIK)
https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/

Currently uploading an INT8 version of Deepseek 3.2 Speciale:
DevQuasar/deepseek-ai.DeepSeek-V3.2-Speciale-Channel-INT8

I cannot test this I'm on AMD
"AssertionError: W8A8Int8LinearMethod on CPU requires that CPU has AMX support"
(I assumed it can fall back to some non optimized kernel but seems not)

If anyone with the required resources (Intel Xeon 5/6 + ~768-1TB ram) can help to test this that would be awesome.

If you have hints how to make this work on AMD Threadripper 7000 Pro series please guide me.

Thanks all!

8 replies

·

csabakecskemeti

posted an update about 2 months ago

Post

303

Recently there are so much activity on token efficient formats, I've also build a package (inspired by toon).

Deep-TOON

My goal was to token efficiently handle json structures with complex embeddings.

So this is what I've built on the weekend. Feel free try:

https://pypi.org/project/deep-toon/0.1.0/

JingzeShi

posted an update about 2 months ago

Post

243

Is it time to start developing sparse attention again?
https://github.com/SmallDoges/flash-sparse-attention

csabakecskemeti

posted an update 3 months ago

Post

2619

Christmas came early this year

3 replies

·

mbrack

authored a paper 3 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14, 2025 • 18

lysandre

posted an update 4 months ago

Post

7443

We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!

6 replies

·

teknium

authored 2 papers 4 months ago

Hermes 3 Technical Report

Paper • 2408.11857 • Published Aug 15, 2024 • 56

Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25, 2025 • 44

jeffboudier

posted an update 4 months ago

Post

3096

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

clem

posted an update 5 months ago

Post

4845

Thread to gossip during the

openai GPT-5 livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfo. Feel free to post your impressions below!

29 replies

·

JingzeShi

authored a paper 5 months ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4, 2025 • 17

JingzeShi

posted an update 5 months ago

Post

4077

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

ybelkada

authored 2 papers 5 months ago

NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models

Paper • 2506.07731 • Published Jun 9, 2025 • 2

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30, 2025 • 68

jeffboudier

posted an update 6 months ago

Post

562

AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz

mbrack

authored a paper 7 months ago

How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions

Paper • 2506.16679 • Published Jun 20, 2025 • 1

AI & ML interests

Team members 193

HF-Party's activity