Yufan Zhang
zyf515730395
AI & ML interests
None yet
Recent Activity
updated
a collection
10 days ago
Video Generation
updated
a collection
10 days ago
Video Generation
upvoted
a
paper
17 days ago
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Organizations
None yet
Video Understanding
-
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper • 2503.21776 • Published • 79 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Kwai Keye-VL 1.5 Technical Report
Paper • 2509.01563 • Published • 37
3D Gen&Recon
-
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Paper • 2506.05573 • Published • 82 -
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data
Paper • 2506.04120 • Published • 7 -
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Paper • 2506.02751 • Published • 4 -
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Paper • 2505.07747 • Published • 61
Image Generation
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
M-RAG
MLLM
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 77 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
LLM
-
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper • 2506.08889 • Published • 23 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 93 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 50
Video Generation
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
AR Generation
M-RAG
Video Understanding
-
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper • 2503.21776 • Published • 79 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Kwai Keye-VL 1.5 Technical Report
Paper • 2509.01563 • Published • 37
MLLM
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 77 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
3D Gen&Recon
-
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Paper • 2506.05573 • Published • 82 -
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data
Paper • 2506.04120 • Published • 7 -
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Paper • 2506.02751 • Published • 4 -
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Paper • 2505.07747 • Published • 61
LLM
-
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper • 2506.08889 • Published • 23 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 93 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 50
Image Generation
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
Video Generation
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4