SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 253
Moonlight-A3B Collection Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer • 3 items • Updated Nov 2, 2025 • 9
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22, 2025 • 126
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator 24 days ago • 44
Nemotron v3 Pre-Training Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 18 days ago • 7
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6, 2025 • 39
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models 26 days ago • 104
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published 26 days ago • 88
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 18 days ago • 56
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 6 items • Updated 11 days ago • 117
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 18 days ago • 91
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 13 items • Updated 18 days ago • 34
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated Dec 9, 2025 • 38
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 270
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated Dec 2, 2025 • 82