| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:safeav:ctrl:sim [2025/10/22 02:19] – momala | en:safeav:ctrl:sim [2026/04/24 09:45] (current) – raivo.sell |
|---|
| ====== Simulation & Formal Methods ====== | ====== Simulation & Formal Methods ====== |
| {{:en:iot-open:czapka_m.png?50| Masters (2nd level) classification icon }} | |
| |
| <todo @momala></todo> | ===== Why Simulation Needs Formalism ===== |
| | |
| ====== Why Simulation Needs Formalism ====== | |
| |
| Simulation is indispensable in autonomous-vehicle validation because it lets us probe safety-critical behavior without exposing the public to risk, but simulation alone is only as persuasive as its predictive value. A simulator that cannot anticipate how the real system behaves—because of poor modeling, missing variability, or unmeasured assumptions—does not provide credible evidence for a safety case. This is why we pair simulation with formal methods: a discipline for specifying scenarios and safety properties with mathematical precision, generating test cases systematically, and measuring how closely simulated outcomes match track or road trials. In our program, the digital twin of the vehicle and its operating environment acts as the concrete “world model,” while formal specifications direct the exploration of that world to the places where safety margins are most likely to fail. | Simulation is indispensable in autonomous-vehicle validation because it lets us probe safety-critical behavior without exposing the public to risk, but simulation alone is only as persuasive as its predictive value. A simulator that cannot anticipate how the real system behaves—because of poor modeling, missing variability, or unmeasured assumptions—does not provide credible evidence for a safety case. This is why we pair simulation with formal methods: a discipline for specifying scenarios and safety properties with mathematical precision, generating test cases systematically, and measuring how closely simulated outcomes match track or road trials. In our program, the digital twin of the vehicle and its operating environment acts as the concrete “world model,” while formal specifications direct the exploration of that world to the places where safety margins are most likely to fail. |
| Building such twins is non-trivial. Our workflow constructs environment twins from aerial photogrammetry with RTK-supported georeferencing, then processes point clouds into assets capable of driving a modern simulator. The resulting model can be used across many AVs and studies, amortizing the cost of data collection and asset creation while preserving the fidelity needed for planning, perception, and control validation. | Building such twins is non-trivial. Our workflow constructs environment twins from aerial photogrammetry with RTK-supported georeferencing, then processes point clouds into assets capable of driving a modern simulator. The resulting model can be used across many AVs and studies, amortizing the cost of data collection and asset creation while preserving the fidelity needed for planning, perception, and control validation. |
| |
| ====== From Scenarios to Properties: Making Requirements Executable ====== | Digital twin and simulation ecosystems differ not only in fidelity and purpose across domains, but also in the **toolchains and platforms** that have emerged to support them. In **ground systems** (automotive, robotics), simulation is dominated by scalable, scenario-rich environments tightly coupled to AI/ML stacks. Widely used platforms include CARLA (open-source, Unreal Engine–based), NVIDIA DRIVE Sim (GPU-accelerated, synthetic data generation), PreScan and Simcenter (sensor-to-system validation), and MATLAB/Simulink for model-based design, SIL/HIL, and control validation. These platforms emphasize large-scale scenario generation, perception stack validation, and real-time or accelerated simulation with closed-loop autonomy. |
| | |
| | In **airborne systems**, simulation platforms are more tightly aligned with certification workflows and high-fidelity physics. Common tools include X-Plane (used in research and some FAA-approved training contexts), Prepar3D, and engineering-grade environments such as ANSYS Fluent and MSC Adams for aerodynamics and flight dynamics. MATLAB/Simulink again plays a central role for flight control laws, avionics integration, and DO-178C/DO-331–aligned model-based development. These ecosystems support pilot-in-the-loop, avionics-in-the-loop, and increasingly autonomy-in-the-loop simulations with strong traceability. |
| | |
| | For **marine systems**, simulation platforms reflect the importance of hydrodynamics, environmental disturbances, and long-duration operations. Representative tools include OrcaFlex (widely used for offshore structures and subsea systems), MOOS-IvP (common in autonomous underwater and surface vehicles), and Delft3D for simulating currents, sediment, and coastal processes. These are often coupled with control and navigation development in MATLAB/Simulink or ROS-based stacks. Compared to ground/air, marine simulations tend to trade interaction density for environmental realism and long-horizon mission modeling. |
| | |
| | In space systems, simulation platforms are deeply rooted in astrodynamics, mission design, and high-fidelity subsystem modeling. Key tools include Systems Tool Kit (STK) for orbital analysis and mission planning, GMAT for trajectory optimization, and FreeFlyer. For system-level digital twins and MBSE integration, platforms such as Cameo Systems Modeler (SysML-based) and Simulink are widely used. These environments support mission rehearsal, fault analysis, and increasingly onboard autonomy validation, where simulation substitutes for otherwise impossible real-world testing. Across all four domains, a clear pattern emerges: **ground systems favor scale and data-driven simulation**, while **space systems prioritize first-principles fidelity**, with airborne and marine occupying structured intermediate points shaped by certification and environmental complexity. |
| | |
| | |
| | ===== From Scenarios to Properties: Making Requirements Executable ===== |
| |
| Formal methods begin by making requirements executable. We express test intent as a distribution over concrete scenes using the SCENIC language, which provides geometric and probabilistic constructs to describe traffic, occlusions, placements, and behaviors. A SCENIC program defines a scenario whose parameters are sampled to generate test cases; each case yields a simulation trace against which temporal properties—our safety requirements—are monitored. This tight loop, implemented with the VERIFAI toolkit, supports falsification (actively searching for violations), guided sampling, and clustering of outcomes for test selection. | Formal methods begin by making requirements executable. We express test intent as a distribution over concrete scenes using the SCENIC language, which provides geometric and probabilistic constructs to describe traffic, occlusions, placements, and behaviors. A SCENIC program defines a scenario whose parameters are sampled to generate test cases; each case yields a simulation trace against which temporal properties—our safety requirements—are monitored. This tight loop, implemented with the VERIFAI toolkit, supports falsification (actively searching for violations), guided sampling, and clustering of outcomes for test selection. |
| Our project also leverages scenario distribution over maps: using OpenDRIVE networks of the TalTech campus, SCENIC instantiates the same behavioral narrative—say, overtaking a slow or stopped vehicle—at diverse locations, ensuring that lane geometry, curbside clutter, and occlusions vary meaningfully while the safety property remains constant. The result is a family of tests that stress the same planning and perception obligations under different geometric and environmental embeddings. | Our project also leverages scenario distribution over maps: using OpenDRIVE networks of the TalTech campus, SCENIC instantiates the same behavioral narrative—say, overtaking a slow or stopped vehicle—at diverse locations, ensuring that lane geometry, curbside clutter, and occlusions vary meaningfully while the safety property remains constant. The result is a family of tests that stress the same planning and perception obligations under different geometric and environmental embeddings. |
| |
| ====== Selection, Execution, and Measuring the Sim-to-Real Gap ====== | ===== Selection, Execution, and Measuring the Sim-to-Real Gap ===== |
| |
| A formal pipeline is only convincing if simulated insights transfer to the track. After falsification, we select representative safe/unsafe cases through visualization or clustering of the safe/error tables and implement them on a closed course with controllable agents. Notably, the same SCENIC parameters (starting pose, start time, velocities) drive hardware actors on the track as drove agents in simulation, subject to physical limitations of the test equipment. This parity enables apples-to-apples comparisons between simulated and real traces. | A formal pipeline is only convincing if simulated insights transfer to the track. After falsification, we select representative safe/unsafe cases through visualization or clustering of the safe/error tables and implement them on a closed course with controllable agents. Notably, the same SCENIC parameters (starting pose, start time, velocities) drive hardware actors on the track as drove agents in simulation, subject to physical limitations of the test equipment. This parity enables apples-to-apples comparisons between simulated and real traces. |
| This formal sim-to-track pipeline does more than label outcomes; it helps diagnose causes. By replaying logged runs through the autonomy stack’s visualization tools, we can attribute unsafe behavior to perception misses, unstable planning decisions, or mispredictions, and then target those subsystems in subsequent formal campaigns. In one case set, the dominant failure mode was oscillatory planning around a pedestrian, discovered and characterized through this exact loop of scenario specification, falsification, track execution, and trace analysis. | This formal sim-to-track pipeline does more than label outcomes; it helps diagnose causes. By replaying logged runs through the autonomy stack’s visualization tools, we can attribute unsafe behavior to perception misses, unstable planning decisions, or mispredictions, and then target those subsystems in subsequent formal campaigns. In one case set, the dominant failure mode was oscillatory planning around a pedestrian, discovered and characterized through this exact loop of scenario specification, falsification, track execution, and trace analysis. |
| |
| ====== Multi-Fidelity Workflows and Continuous Assurance ====== | ===== Multi-Fidelity Workflows and Continuous Assurance ===== |
| |
| Exhaustive testing is infeasible, so we combine multiple fidelity levels to balance breadth with realism. Low-fidelity (LF) platforms sweep large scenario grids quickly to map where safety margins begin to tighten; high-fidelity (HF) platforms (e.g., LGSVL/Unity integrated with Autoware) replay the most informative LF cases with photorealistic sensors and closed-loop control. Logging is harmonized so that KPIs and traces are comparable across levels, and optimization or tuning derived from LF sweeps is verified under HF realism before any track time is spent. In extensive experiments, thousands of LF runs revealed broad patterns, but only HF replays uncovered subtle interactions that flipped outcomes—evidence that fidelity matters exactly where the safety case will later be challenged. | Exhaustive testing is infeasible, so we combine multiple fidelity levels to balance breadth with realism. Low-fidelity (LF) platforms sweep large scenario grids quickly to map where safety margins begin to tighten; high-fidelity (HF) platforms (e.g., LGSVL/Unity integrated with Autoware) replay the most informative LF cases with photorealistic sensors and closed-loop control. Logging is harmonized so that KPIs and traces are comparable across levels, and optimization or tuning derived from LF sweeps is verified under HF realism before any track time is spent. In extensive experiments, thousands of LF runs revealed broad patterns, but only HF replays uncovered subtle interactions that flipped outcomes—evidence that fidelity matters exactly where the safety case will later be challenged. |