====== Scenario Design ====== Scenario design is the point where planning and control validation becomes concrete. A control or planning function cannot be evaluated only through abstract claims such as “the planner is safe” or “the controller is robust.” It must be tested in situations that represent the intended operational design domain, with parameters that can be varied systematically and outcomes that can be measured consistently. For this reason, scenario design is the bridge between system requirements and executable validation cases. The purpose of this subsection is to show how a validation idea becomes a testable scenario. The basic progression is simple: a use case is described in natural language, that description is turned into a structured logical scenario, and then the logical scenario is instantiated into concrete test cases. This structure allows the same planning or control function to be tested across many variations of speed, distance, road geometry, actor behavior, visibility, and vehicle state. ===== Scenario abstraction levels ===== A useful way to organize validation is to distinguish three levels of scenario abstraction. ^ Scenario level ^ What it represents ^ Example ^ | Functional scenario | A human-readable description of the situation | Ego vehicle overtakes a slower lead vehicle | | Logical scenario | Parameter ranges and constraints | Ego speed 20–40 km/h, lead speed 5–20 km/h, initial gap 10–40 m | | Concrete scenario | One executable test case with fixed values | Ego speed 30 km/h, lead speed 10 km/h, gap 20 m, dry daylight | The functional scenario is the most accessible starting point. It tells the reader what type of driving situation is being examined, but it does not yet define a test. The logical scenario turns that situation into a parameterized family of cases. This is where validation becomes systematic, because the same scenario can be repeated with different values for speed, distance, weather, road shape, and other variables. The concrete scenario is the final executable instance. It is the single run that appears in simulation, on a test track, or in a monitored field trial. This three-step structure is especially useful for planning and control because these functions are highly sensitive to context. A lane change that is safe at low speed and with a large gap may become unsafe when the lead vehicle is faster, when the adjacent lane contains a moving actor, or when the controller reacts too late. Scenario abstraction therefore helps the engineer separate the behavior that should remain stable from the factors that are deliberately varied. ===== What a scenario must describe ===== A good scenario is more than a scene description. It should define the elements that matter to the validation question. ^ Scenario element ^ What should be specified ^ | Ego vehicle state | Position, speed, heading, acceleration, planned maneuver | | Other actors | Number, type, speed, intent, motion constraints | | Road and infrastructure | Lane geometry, signs, signals, curbs, merges, intersections | | Environment | Weather, light, visibility, road friction, occlusions | | Timing | Initial time, trigger moment, reaction window, duration | | Vehicle limits | Braking capability, turning radius, steering limits, comfort bounds | | System assumptions | Localization quality, perception delay, prediction horizon, fallback behavior | These elements define the test context and prevent ambiguity. If a scenario does not explicitly specify the initial state, the actor behavior, or the environmental constraints, then it is difficult to reproduce the test or interpret the result. The goal is not to overspecify every detail, but to identify the variables that change the planning and control response. For planning and control, the most important scenario variables often include vehicle speed, initial separation, relative speed, road curvature, lane availability, traffic density, and timing delay. Those are the factors that tend to change whether a maneuver is safe, whether a controller can track the trajectory, and whether the system can recover when conditions become difficult. ===== From scenarios to validation cases ===== Once the scenario family is defined, the next step is to select concrete cases that stress the planning or control logic in meaningful ways. This is where design of experiments becomes useful. Rather than testing one nominal case repeatedly, the engineer deliberately varies the scenario parameters so that the same maneuver is evaluated under different conditions. This reveals which factors have the strongest effect on safety and performance. For example, a lane-change scenario can be parameterized by: - ego speed, - lead vehicle speed, - gap to the target lane, - presence of a faster vehicle in the adjacent lane, - lateral offset, - road curvature, - localization error, - controller delay. By varying these parameters systematically, the validation team can identify boundary conditions. Some cases may be clearly safe, some clearly unsafe, and some close to the operational limit. The purpose of scenario design is to expose those limits early, before the system is accepted as safe for deployment. This is also where scenario-based validation becomes more useful than mileage alone. Distance driven tells us how much the vehicle has moved. Scenario coverage tells us what kinds of situations the vehicle has actually faced. For planning and control, the second is much more informative than the first. ===== Scenario families for planning and control ===== Different validation questions require different scenario families. The table below gives a practical organization. ^ Scenario family ^ Typical question ^ | Lane change and overtaking | Can the vehicle choose and execute a safe passing maneuver? | | Cut-in and cut-out | Can the vehicle handle a nearby actor entering or leaving the lane? | | Obstacle avoidance | Can the planner redirect the vehicle around a static or dynamic obstacle? | | Stop and yield | Does the vehicle slow down or stop correctly at crosswalks, intersections, or merges? | | Following behavior | Can the vehicle maintain safe headway and stable tracking behind another vehicle? | | Emergency behavior | Does the system transition safely to a fallback state when the plan becomes unsafe? | Each family should be translated into a logical scenario description and then into a set of concrete test cases. The same maneuver can be repeated across different speeds, actor behaviors, road geometries, and environmental conditions, which makes it possible to compare outcomes and identify trends rather than isolated events. ===== Scenario outputs and labels ===== A scenario is only useful if its output can be interpreted consistently. For planning and control, the result should be expressed through clear labels and quantitative metrics. ^ Outcome label ^ Meaning ^ | Success | The maneuver was completed safely and within the expected constraints | | Collision | The scenario resulted in an impact | | Separation violation | The vehicle came too close to another actor or obstacle | | Excessive deceleration | The vehicle behaved too aggressively or uncomfortably | | Long pass without return | The vehicle completed the maneuver but failed to return to the nominal path or lane behavior | | Timeout | The system failed to complete the maneuver in the allotted time | These labels are useful because they tie the scenario directly to system behavior. They help distinguish between a planner that is merely inefficient, a controller that is merely slow, and a system that is actually unsafe. The purpose of scenario design is not only to generate runs, but to make the runs interpretable. ===== Why this matters for the rest of the chapter ===== Scenario design is the foundation for the later validation methods. Simulation, software-in-the-loop, hardware-in-the-loop, formal falsification, test-track execution, and field trials all depend on having scenarios that are well-defined and reproducible. If the scenario itself is vague, then the test evidence becomes weak, even if the simulator or test track is highly realistic. For that reason, this subsection should be read as the preparatory stage for the test-method section that follows. It defines what should be tested, how the test family should be structured, and which parameters should be varied. The next subsection can then explain how those scenarios are executed through simulation, formal methods, and physical testing. /* -------- Commented ----- ====== Simulation & Formal Methods ====== ===== Why Simulation Needs Formalism ===== Simulation is indispensable in autonomous-vehicle validation because it lets us probe safety-critical behavior without exposing the public to risk, but simulation alone is only as persuasive as its predictive value. A simulator that cannot anticipate how the real system behaves—because of poor modeling, missing variability, or unmeasured assumptions—does not provide credible evidence for a safety case. This is why we pair simulation with formal methods: a discipline for specifying scenarios and safety properties with mathematical precision, generating test cases systematically, and measuring how closely simulated outcomes match track or road trials. In our program, the digital twin of the vehicle and its operating environment acts as the concrete “world model,” while formal specifications direct the exploration of that world to the places where safety margins are most likely to fail. Treating the digital twin as a live feedback loop is central to maintaining predictive value over time. The twin ingests logs and environmental data from the physical shuttle, updates maps and vehicle parameters, and feeds those data back into the simulator so that new tests reflect actual wear, calibration drift, and environmental change. This continuous synchronization turns simulation into an ongoing assurance activity rather than a one-off milestone. Building such twins is non-trivial. Our workflow constructs environment twins from aerial photogrammetry with RTK-supported georeferencing, then processes point clouds into assets capable of driving a modern simulator. The resulting model can be used across many AVs and studies, amortizing the cost of data collection and asset creation while preserving the fidelity needed for planning, perception, and control validation. Digital twin and simulation ecosystems differ not only in fidelity and purpose across domains, but also in the **toolchains and platforms** that have emerged to support them. In **ground systems** (automotive, robotics), simulation is dominated by scalable, scenario-rich environments tightly coupled to AI/ML stacks. Widely used platforms include CARLA (open-source, Unreal Engine–based), NVIDIA DRIVE Sim (GPU-accelerated, synthetic data generation), PreScan and Simcenter (sensor-to-system validation), and MATLAB/Simulink for model-based design, SIL/HIL, and control validation. These platforms emphasize large-scale scenario generation, perception stack validation, and real-time or accelerated simulation with closed-loop autonomy. In **airborne systems**, simulation platforms are more tightly aligned with certification workflows and high-fidelity physics. Common tools include X-Plane (used in research and some FAA-approved training contexts), Prepar3D, and engineering-grade environments such as ANSYS Fluent and MSC Adams for aerodynamics and flight dynamics. MATLAB/Simulink again plays a central role for flight control laws, avionics integration, and DO-178C/DO-331–aligned model-based development. These ecosystems support pilot-in-the-loop, avionics-in-the-loop, and increasingly autonomy-in-the-loop simulations with strong traceability. For **marine systems**, simulation platforms reflect the importance of hydrodynamics, environmental disturbances, and long-duration operations. Representative tools include OrcaFlex (widely used for offshore structures and subsea systems), MOOS-IvP (common in autonomous underwater and surface vehicles), and Delft3D for simulating currents, sediment, and coastal processes. These are often coupled with control and navigation development in MATLAB/Simulink or ROS-based stacks. Compared to ground/air, marine simulations tend to trade interaction density for environmental realism and long-horizon mission modeling. In space systems, simulation platforms are deeply rooted in astrodynamics, mission design, and high-fidelity subsystem modeling. Key tools include Systems Tool Kit (STK) for orbital analysis and mission planning, GMAT for trajectory optimization, and FreeFlyer. For system-level digital twins and MBSE integration, platforms such as Cameo Systems Modeler (SysML-based) and Simulink are widely used. These environments support mission rehearsal, fault analysis, and increasingly onboard autonomy validation, where simulation substitutes for otherwise impossible real-world testing. Across all four domains, a clear pattern emerges: **ground systems favor scale and data-driven simulation**, while **space systems prioritize first-principles fidelity**, with airborne and marine occupying structured intermediate points shaped by certification and environmental complexity. ===== From Scenarios to Properties: Making Requirements Executable ===== Formal methods begin by making requirements executable. We express test intent as a distribution over concrete scenes using the SCENIC language, which provides geometric and probabilistic constructs to describe traffic, occlusions, placements, and behaviors. A SCENIC program defines a scenario whose parameters are sampled to generate test cases; each case yields a simulation trace against which temporal properties—our safety requirements—are monitored. This tight loop, implemented with the VERIFAI toolkit, supports falsification (actively searching for violations), guided sampling, and clustering of outcomes for test selection. In practice, the pipeline unfolds as follows. We first assemble the photorealistic simulated world and dynamics models from HD maps and 3D meshes. We then formalize scenarios in SCENIC and define safety properties as monitorable metrics—often using robust semantics of Metric Temporal Logic (MTL), which provide not just a pass/fail verdict but a quantitative margin to violation. VERIFAI searches the parameter space, records safe and error tables, and quantifies “how strongly” a property held or failed; these scores guide which cases deserve promotion to track tests. This process transforms vague test ideas (“test passing pedestrians”) into a concrete population of parameterized scenes with measurable, comparable outcomes. Our project also leverages scenario distribution over maps: using OpenDRIVE networks of the TalTech campus, SCENIC instantiates the same behavioral narrative—say, overtaking a slow or stopped vehicle—at diverse locations, ensuring that lane geometry, curbside clutter, and occlusions vary meaningfully while the safety property remains constant. The result is a family of tests that stress the same planning and perception obligations under different geometric and environmental embeddings. ===== Selection, Execution, and Measuring the Sim-to-Real Gap ===== A formal pipeline is only convincing if simulated insights transfer to the track. After falsification, we select representative safe/unsafe cases through visualization or clustering of the safe/error tables and implement them on a closed course with controllable agents. Notably, the same SCENIC parameters (starting pose, start time, velocities) drive hardware actors on the track as drove agents in simulation, subject to physical limitations of the test equipment. This parity enables apples-to-apples comparisons between simulated and real traces. We then quantify the sim-to-real gap using time-series metrics such as dynamic time warping and the Skorokhod distance to compare trajectories, first-detection times, and minimum-distance profiles. In published results, trajectories for the same test were qualitatively similar but showed measurable differences in separation minima and timing; moreover, even identical simulations can diverge when the autonomy stack is non-deterministic, a reality that the methodology surfaces rather than hides. Understanding this variance is a virtue: tests with lower variance are more reproducible on track, while highly variable tests reveal sensitivity in planning, perception, or prediction that merits redesign or tighter ODD limits. This formal sim-to-track pipeline does more than label outcomes; it helps diagnose causes. By replaying logged runs through the autonomy stack’s visualization tools, we can attribute unsafe behavior to perception misses, unstable planning decisions, or mispredictions, and then target those subsystems in subsequent formal campaigns. In one case set, the dominant failure mode was oscillatory planning around a pedestrian, discovered and characterized through this exact loop of scenario specification, falsification, track execution, and trace analysis. ===== Multi-Fidelity Workflows and Continuous Assurance ===== Exhaustive testing is infeasible, so we combine multiple fidelity levels to balance breadth with realism. Low-fidelity (LF) platforms sweep large scenario grids quickly to map where safety margins begin to tighten; high-fidelity (HF) platforms (e.g., LGSVL/Unity integrated with Autoware) replay the most informative LF cases with photorealistic sensors and closed-loop control. Logging is harmonized so that KPIs and traces are comparable across levels, and optimization or tuning derived from LF sweeps is verified under HF realism before any track time is spent. In extensive experiments, thousands of LF runs revealed broad patterns, but only HF replays uncovered subtle interactions that flipped outcomes—evidence that fidelity matters exactly where the safety case will later be challenged. This workflow sits within a DOE-driven V&V suite that treats the digital twin and scenario engine as programmable assets. Scenario definitions, vehicle models, and evaluation logic are versioned; control-loop delays, TTC profiles, and collision metrics are computed consistently per run; and the same infrastructure can be extended downward into hardware-in-the-loop experiments of low-level control paths to test actuator-path integrity under identical scene conditions. In our project platform, the simulator co-runs with Autoware, accepts parameterized scenarios through a public interface, and emits validation reports that roll up from frame-level signal checks to mission-level success, closing the traceability chain from formal property to system outcome. Just as important as capability is honesty about limits. Our own survey and case study argue for explicit attention to abstraction choices, modeling assumptions, and convergence questions for AI-based components. The literature and our results stress that simulation’s value depends on calibrated models, careful measurement of non-determinism, and disciplined mapping to the real world; formal methods help precisely because they make these assumptions visible, testable, and comparable over time. The digital-twin perspective then turns those measurements into an engine for continuous improvement, updating the twin as the physical system and environment evolve. */