VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 4 days ago • 28
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 5 days ago • 46
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models Paper • 2601.00423 • Published 8 days ago • 8
Klear: Unified Multi-Task Audio-Video Joint Generation Paper • 2601.04151 • Published 2 days ago • 12
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published about 22 hours ago • 83
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 7 days ago • 48
Guiding a Diffusion Transformer with the Internal Dynamics of Itself Paper • 2512.24176 • Published 10 days ago • 7
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published 14 days ago • 17
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published 11 days ago • 13
SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published 14 days ago • 37
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 11 days ago • 44
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 11 days ago • 64
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 23 days ago • 30