Simulation of the Waymo Driver evading a vehicle going in the wrong direction. The simulation initially follows a real event, and seamlessly transitions to using camera and lidar images automatically generated by an efficient real-time Waymo World Model.
Waymo Driver 规避一辆逆行车辆的仿真示例。该仿真最初基于一次真实事件展开,并无缝过渡到由高效实时的 Waymo World Model 自动生成的摄像头与激光雷达图像。
Simulation is a critical component of Waymo’s AI ecosystem and one of the three key pillars of our approach to demonstrably safe AI. The Waymo World Model, which we detail below, is the component that is responsible for generating hyper-realistic simulated environments.
仿真是 Waymo AI 生态系统中的关键组成部分,也是其实现“可验证安全 AI”方法论的三大支柱之一。下文将介绍的 Waymo World Model,正是负责生成高度逼真的模拟环境的核心组件。
The Waymo World Model is built upon Genie 3—Google DeepMind's most advanced general-purpose world model that generates photorealistic and interactive 3D environments—and is adapted for the rigors of the driving domain. By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.
Waymo World Model 构建在 Genie 3 之上——这是 Google DeepMind 最先进的通用世界模型,能够生成具备照片级真实感且可交互的3D环境,并针对自动驾驶场景的严格需求进行了专门适配。借助 Genie 所蕴含的海量世界知识,该模型可以模拟极其罕见的事件——从龙卷风到偶遇大象——这些场景在现实中几乎不可能以规模化方式采集。模型架构还具备高度可控性,使工程师能够通过简单的语言提示、驾驶输入以及场景布局来修改仿真内容。值得注意的是,Waymo World Model 能够生成高保真的多传感器输出,包括摄像头和激光雷达数据。
This combination of broad world knowledge, fine-grained controllability, and multi-modal realism enhances Waymo’s ability to safely scale our service across more places and new driving environments. In the following sections we showcase the Waymo World Model in action, featuring simulations of the Waymo Driver navigating diverse rare edge-case scenarios.
这种“广泛世界知识 + 精细可控性 + 多模态真实感”的结合,大幅提升了 Waymo 在更多地区和全新驾驶环境中安全扩展服务的能力。在接下来的部分中,我们将展示 Waymo World Model 的实际表现,包括 Waymo Driver 在多种罕见长尾场景中的仿真导航示例。
Emergent Multimodal World Knowledge(涌现的多模态世界知识)
Most simulation models in the autonomous driving industry are trained from scratch based on only the on-road data they collect. That approach means the system only learns from limited experience. Genie 3’s strong world knowledge, gained from its pre-training on an extremely large and diverse set of videos, allows us to explore situations that were never directly observed by our fleet.
Through our specialized post-training, we are transferring that vast world knowledge from 2D video into 3D lidar outputs unique to Waymo’s hardware suite. While cameras excel at depicting visual details, lidar sensors provide valuable complementary signals like precise depth. The Waymo World Model can generate virtually any scene—from regular, day-to-day driving to rare, long-tail scenarios—across multiple sensor modalities.
通过专门设计的后训练流程,我们正将这种海量的世界知识从 2D 视频迁移到适配 Waymo 硬件体系的 3D 激光雷达输出中。摄像头擅长呈现丰富的视觉细节,而激光雷达则提供精确深度等关键互补信息。Waymo World Model 因此能够在多传感器模态下生成几乎任意场景——从日常驾驶环境到罕见的长尾极端情境。
Extreme weather conditions and natural disasters(极端天气条件与自然灾害)
Driving on the Golden Gate Bridge, covered in light snow. Waymo’s shadow is visible in the front camera footage.
在覆盖着薄雪的金门大桥上行驶。前置摄像头画面中可以看到Waymo的影子。
Driving out of a raging fire.
驾车驶出熊熊烈火
A suburban cul de sac completely submerged in stagnant flood water with floating furniture.
The Waymo World Model offers strong simulation controllability through three main mechanisms: driving action control, scene layout control, and language control.
Waymo World Model 通过三种主要机制提供强大的仿真可控能力:驾驶动作控制、场景布局控制以及语言控制。
Driving action control allows us to have a responsive simulator that adheres to specific driving inputs. This enables us to simulate “what if” counterfactual events such as whether the Waymo Driver could have safely driven more confidently instead of yielding in a particular situation.
Counterfactual driving. We demonstrate simulations both under the original route in a past recorded drive, or a completely new route. While purely reconstructive simulation methods (e.g., 3D Gaussian Splats, or 3DGS) suffer from visual breakdowns due to missing observations when the simulated route is too different from the original driving, the fully learned Waymo World Model maintains good realism and consistency thanks to its strong generative capabilities.
反事实驾驶仿真。 我们可以在历史真实行驶记录的原始路线下进行仿真,也可以在完全全新的路线条件下进行模拟。相比之下,纯重建式仿真方法(例如 3D Gaussian Splats / 3DGS)在模拟路线与原始行驶轨迹差异较大时,往往会因为缺失观测而出现明显的视觉崩溃;而完全基于学习的 Waymo World Model 则凭借其强大的生成能力,依然能够保持良好的真实感与一致性。
Scene layout control allows for customization of the road layouts, traffic signal states, and the behavior of other road users. This way, we can create custom scenarios via selective placement of other road users, or applying custom mutations to road layouts.
Language control is our most flexible tool that allows us to adjust time-of-day, weather conditions, or even generate an entirely synthetic scene (such as the long-tail scenarios shown previously).
During a scenic drive, it is common to record videos of the journey on mobile devices or dashcams, perhaps capturing piled up snow banks or a highway at sunset. The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene. This process enables the highest degree of realism and factuality, since simulations are derived from actual footage.
在风景驾驶过程中,人们常会用手机或行车记录仪拍摄旅程视频,例如拍到堆积的雪堤或夕阳下的高速公路。Waymo World Model 能将这些视频,或任何普通相机拍摄的画面,转换为多模态仿真——呈现 Waymo Driver 在该场景中的真实视觉感受。由于仿真直接源自真实影像,这一过程能够实现最高程度的真实感与准确性。
Some scenes we want to simulate may take longer to play out, for example, negotiating passage in a narrow lane. That’s harder to do because the longer the simulation, the tougher it is to compute and maintain stable quality. However, through a more efficient variant of the Waymo World Model, we can simulate longer scenes with dramatic reduction in compute while maintaining high realism and fidelity to enable large-scale simulations.
有些场景的仿真可能需要较长时间才能完整呈现,例如在狭窄车道中通过。长时间仿真更具挑战性,因为仿真越长,计算成本越高,同时保持稳定的质量也更困难。然而,借助 Waymo World Model 的更高效变体,我们可以在大幅降低计算量的同时模拟更长的场景,并保持高真实感和高保真度,从而支持大规模仿真。Long rollout (4x speed playback) on an efficient variant of the Waymo World Model。
By simulating the “impossible”, we proactively prepare the Waymo Driver for some of the most rare and complex scenarios. This creates a more rigorous safety benchmark, ensuring the Waymo Driver can navigate long-tail challenges long before it encounters them in the real world.