Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen PRO
chenjoya
AI & ML interests
Video LLM
Recent Activity
upvoted
a
paper
7 days ago
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
upvoted
a
paper
9 days ago
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
upvoted
a
paper
about 1 month ago
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models