Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks Paper • 2508.18672 • Published Aug 26, 2025 • 10 • 2
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published Feb 26, 2025 • 6 • 3