FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper • 2512.10927 • Published 28 days ago • 5
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper • 2512.10927 • Published 28 days ago • 5
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning Paper • 2509.24372 • Published Sep 29, 2025 • 9
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22, 2025 • 63
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published Jan 27, 2025 • 19
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 52
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper • 2409.04429 • Published Sep 6, 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14, 2024 • 12
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25, 2024 • 19