Submitted by Saeed Ranjbar Alvar 6 From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model Huawei's Vancouver VBDAI Lab 3 2