State over Tokens: Characterizing the Role of Reasoning Tokens Paper • 2512.12777 • Published 24 days ago • 3 • 6
State over Tokens: Characterizing the Role of Reasoning Tokens Paper • 2512.12777 • Published 24 days ago • 3 • 6
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10, 2025 • 661 • 56
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published Aug 15, 2025 • 8 • 6
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published Aug 15, 2025 • 8 • 6
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20, 2025 • 39 • 3
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2, 2025 • 238 • 13
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22, 2025 • 122 • 11
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22, 2025 • 122 • 11
Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation Paper • 2507.09074 • Published Jul 11, 2025 • 6 • 5
Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation Paper • 2507.09074 • Published Jul 11, 2025 • 6 • 5
Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation Paper • 2507.09074 • Published Jul 11, 2025 • 6 • 5
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20, 2025 • 24 • 6
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20, 2025 • 24 • 6