SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published Dec 12, 2025 • 39
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 38
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17, 2025 • 49
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention Paper • 2510.13940 • Published Oct 15, 2025 • 7
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Paper • 2510.10670 • Published Oct 12, 2025 • 19
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published Oct 9, 2025 • 63
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution Paper • 2510.08143 • Published Oct 9, 2025 • 20
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7, 2025 • 82
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7, 2025 • 82
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17, 2025 • 20
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17, 2025 • 20
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation Paper • 2308.05095 • Published Aug 9, 2023
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30, 2025 • 57
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301