Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 8 days ago • 93
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 7 days ago • 52
Improving Recursive Transformers with Mixture of LoRAs Paper • 2512.12880 • Published 23 days ago • 5