Addressing corner cases in autonomous driving: A world model-based approach with mixture of experts and LLMs
Haicheng Liao, Bonan Wang, Junxian Yang, Chengyue Wang,Zhengbing He, Guohui Zhang, Chengzhong Xu, Zhenning Li
Abstract:
Accurate and reliable motion forecasting is essential for the safe deployment of autonomous vehicles (AVs), particularly in rare but safety-critical scenarios known as corner cases. Existing models often underperform in these situations due to an over-representation of common scenes in training data and limited generalization capabilities. To address this limitation, we present WM-MoE, the first world model-based motion forecasting framework that unifies perception, temporal memory, and decision making to address the challenges of high-risk corner-case scenarios. The model constructs a compact scene representation that explains current observations, anticipates future dynamics, and evaluates the outcomes of potential actions. To enhance long-horizon reasoning, we leverage large language models (LLMs) and introduce a lightweight temporal tokenizer that maps agent trajectories and contextual cues into the LLM’s feature space without additional training, enriching temporal context and commonsense priors. Furthermore, a mixture-of-experts (MoE) is introduced to decompose complex corner cases into subproblems and allocate capacity across scenario types, and a router assigns scenes to specialized experts that infer agent intent and perform counterfactual rollouts. In addition, we introduce nuScenes-corner, a new benchmark that comprises four real-world corner-case scenarios for rigorous evaluation. Extensive experiments on four benchmark datasets (nuScenes, NGSIM, HighD, and MoCAD) showcase that WM-MoE consistently outperforms state-of-the-art (SOTA) baselines and remains robust under corner-case and data-missing conditions, indicating the promise of world model-based architectures for robust and generalizable motion forecasting in fully AVs.
Keywords:
Autonomous driving, World models, Motion forecasting, Large language models, Mixture of experts models
摘要
准确且可靠的运动预测对于自动驾驶车辆(AV)的安全部署至关重要,尤其是在罕见但安全至关重要的极端场景中,如“转机”。由于训练数据中常见场景的过度代表以及泛化能力有限,现有模型在这些情况下常常表现不佳。为解决这一局限,我们介绍了WM-MoE,这是全球首个基于模型的运动预测框架,统一感知、时间记忆和决策,以应对高风险的极端情况场景挑战。该模型构建了一个紧凑的场景表示,解释当前观测结果,预测未来动态,并评估潜在行动的结果。为了增强长期推理能力,我们利用大型语言模型(LLM),引入轻量级时间标记器,无需额外训练即可将代理轨迹和上下文线索映射到LLM特征空间,丰富了时间上下文和常识性先验。此外,引入了专家混合(MoE)技术,将复杂的角落案例分解为子问题,并在场景类型间分配容量,路由器将场景分配给专门专家,推断代理意图并执行反事实推广。此外,我们还引入了nuScenes-corner,这是一个新的基准测试,包含四个真实世界的角落情景,供严谨评估。在四个基准数据集(nuScenes、NGSIM、HighD和MoCAD)上的大量实验表明,WM-MoE始终优于最先进的(SOTA)基线,并在角落情况和数据缺失条件下保持稳健,表明基于全球模型的架构在全杀毒软件中实现稳健且可推广的运动预测前景。
关键词
自动驾驶 世界模型 运动预测 大型语言模型 专家混合模型