====== V&V of Planning, Decision-Making, and Control ======
Planning and control must be validated as a system function, not only as isolated algorithms. A planner may produce a technically correct trajectory, and a controller may track a reference path accurately, but the combined behavior can still be unsafe if perception is delayed, localization drifts, prediction is wrong, or the actuation path introduces latency. The validation view therefore focuses on the complete decision–execution loop and asks whether the autonomous vehicle behaves safely, predictably, and consistently inside its intended operating conditions.
The purpose of this subsection is to define how planning and control should be evaluated across different validation levels. The key point is that component tests are useful, but they are only one part of the evidence chain. The reader should be able to trace a planning or control function from its local behavior to its interaction with the full vehicle system, and then to the operating domain where the vehicle is actually expected to function.
===== Validation Levels =====
The validation process can be understood as a progression from local correctness to system-level safety. This progression is especially important in the planning and control layer, because the behavior of the system depends on tightly coupled interactions between the decision layer, the planner, the controller, the vehicle model, and the environment.
^ Validation level ^ What is checked ^ Example for planning and control ^ Typical evidence ^
| Unit level | The component in isolation | Planner logic, controller law, fallback trigger, trajectory tracking rule | Component test results, assertion checks, interface tests |
| Integration level | Interaction between modules | Planner with localization, prediction, and control interface | Closed-loop integration logs, timing traces, message traces |
| System level | Full vehicle behavior in a closed loop | Lane change, overtaking, stopping, obstacle avoidance | Scenario results, system KPIs, safety metrics |
| Operational level | Behavior in the intended use context | Validated operation inside the ODD, track, or field pilot | Validation report, field data, pass/fail evidence |
This distinction matters because a good unit test does not guarantee safe system behavior. A motion planner can be correct as a software module and still create unsafe motion if it receives stale state information or if the controller cannot follow the generated path within the vehicle’s physical limits. Likewise, a controller can be stable in isolation but still produce an unsafe outcome if the trajectory itself is too aggressive or if the vehicle state changes faster than expected.
In the V-model perspective used throughout the handbook, planning and control occupy the portion where implementation evidence must be mapped back to the system requirements. The chapter does not need to repeat the general V-model explanation here. It is enough to show that planning and control are evaluated at several points along that validation chain: first as modules, then as integrated functions, and finally as part of the complete autonomous vehicle.
===== What Must Be Validated =====
The validation question is not simply “does the algorithm work?” It is “does the vehicle behave safely and correctly when the algorithm is embedded in the full autonomy stack?” That means the evaluation must cover the following aspects together:
^ Validation aspect ^ What it means in practice ^
| Functional correctness | The maneuver or trajectory matches the intended behavior |
| Physical feasibility | The motion can be executed by the vehicle without violating dynamics |
| Safety | The vehicle avoids collisions and unsafe close approaches |
| Rule compliance | The motion respects traffic rules, road geometry, and operational constraints |
| Robustness | The behavior remains acceptable under uncertainty, delays, and disturbances |
| Comfort | The motion does not introduce excessive jerk, sharp braking, or unstable steering |
| Timeliness | The planner and controller act within the response time allowed by the scenario |
Planning and control are sensitive to the assumptions behind the system. A small change in localization accuracy, actuation delay, road friction, or prediction uncertainty may produce a very different trajectory outcome. For this reason, validation should not be framed as a single pass/fail test on one nominal case. It should be framed as a collection of evidence showing that the system remains acceptable across the planned range of operating conditions.
===== Validation Logic =====
The planning and control layer is best validated through a chain of evidence. First, the team defines a maneuver or mission objective. Then the system assumptions and operating constraints are specified. Next, scenarios are generated to exercise the maneuver under controlled variation. After that, simulation, closed-loop execution, and physical confirmation are used to check the system response. Finally, the results are expressed in measurable metrics and tied back to the safety argument.
This logic is already visible in the current material, which treats digital twins as the basis for meaningful simulation, uses design-of-experiments to stress the decision and control logic, and combines local properties such as trajectory tracking with system-level effects such as minimum distance to collision. It also emphasizes that the simulator must remain predictive as the product evolves, so that post-deployment logs, updated vehicle parameters, and map changes can be folded back into continuous validation.
The important design principle is that planning and control validation should support both:
1. **local evidence**, where the behavior of a single planner or controller can be checked;
2. **system evidence**, where the combined behavior of the autonomy stack is evaluated in closed loop.
This is why scenario execution, digital twin fidelity, and timing realism matter. If the virtual environment is too abstract, the test may not reveal the same failure modes that appear in the real vehicle. If the virtual environment is too expensive or too detailed, the test program may not scale to a useful number of scenarios. Validation therefore needs a balance between breadth and realism.
===== Evidence Types =====
For this chapter, the most useful evidence types are trajectory evidence, timing evidence, and safety evidence.
^ Evidence type ^ Example output ^ Why it matters ^
| Trajectory evidence | Path error, tracking error, lane deviation, path smoothness | Shows whether the plan can be executed as intended |
| Timing evidence | Planner latency, controller latency, response delay | Shows whether the system reacts quickly enough |
| Safety evidence | Collision result, TTC, DTC, minimum distance | Shows whether the behavior remains safe |
| Robustness evidence | Performance under sensor delay, localization drift, or actuation variation | Shows whether the result survives uncertainty |
| Operational evidence | Performance inside the intended ODD | Shows whether the system is ready for realistic use |
These evidence types should be collected together, not separately. A vehicle that tracks a path accurately but violates safety margins is not acceptable. A vehicle that avoids collision but behaves erratically or unpredictably is also not acceptable. The validation view therefore requires a combined reading of the metrics rather than a single score.
===== Why This View Matters =====
The planning and control layer is the point where autonomous behavior becomes visible in the physical world. A mistake here is not only a software error; it is an action error. That is why this subsection must be treated as a bridge between the planning algorithms described earlier and the scenario-based test methodology that follows. It prepares the reader to ask the right questions: what should be tested, at what level, under which conditions, and with what evidence.
The next subsection should therefore move from this validation view to the concrete generation of test scenarios, logical ranges, and executable cases.
/* ------Commented ------
====== Validation of Control & Planning ======
===== Principles and Scope =====
Planning and control are where intent becomes motion. A planning stack selects a feasible, safety-aware trajectory under evolving constraints; the control stack turns that trajectory into actuation while respecting vehicle dynamics and delays. Validating these layers is therefore about much more than unit tests: it is about demonstrating, with evidence, that the combined decision–execution loop behaves safely and predictably across the intended operational design domain (ODD). In practice, this requires two complementary ideas. First, a digital twin of the vehicle and environment that is accurate enough to make simulation a meaningful predictor of real behavior. Second, a design-of-experiments (DOE)–driven scenario program that stresses the decision and control logic where it matters most, and converts outcomes into monitorable, quantitative metrics. Your V&V suite frames both: scenario descriptions feed a co-running simulator with the under-test algorithms, the digital twin (vehicle and environment) is loaded as an external asset, and the outcome is a structured validation report rather than anecdotal test logs.
Planning/control V&V must also navigate the mix of deterministic dynamics and stochastic perception/prediction. At the component level, your framework treats detection, control, localization, mission planning, and low-level control as distinct abstractions, yet evaluates them in the context of Newtonian physics—explicitly trading fidelity for performance depending on the test intent. This modularity enables validating local properties (e.g., trajectory tracking) while still measuring system-level safety effects (e.g., minimum distance to collision).
A final principle is lifecycle realism. A digital twin is not just a CAD model; it is a live feedback loop receiving data from the physical system and its environment, so the simulator remains predictive as the product evolves. The same infrastructure that generates scenarios can replay field logs, inject updated vehicle parameters, and reflect map changes, enabling continuous V&V of planning and control post-deployment.
===== Scenario-Based Validation with Digital Twins =====
The V&V workflow begins with a formal scenario description: functional narratives are encoded in a human-readable DSL (e.g., M-SDL/Scenic), then reduced to logical parameter ranges and finally to concrete instantiations selected by DOE. This ensures tests are reproducible, shareable, and traceable from high-level goals down to the numeric seeds that define a specific run. The simulator co-executes these scenarios with under the test algorithms inside the digital twin, and the V&V interface collects vehicle control signals, virtual sensor streams, and per-run metrics to generate the verdicts required by the safety case.
To maintain broad coverage without sacrificing realism, validations can be done using a two-layer approach shown in Figure 1. A low-fidelity (LF) layer (e.g., SUMO) sweeps wide parameter grids quickly to reveal where planning/control begins to stress safety constraints; a high-fidelity (HF) layer (e.g., a game engine simulator like CARLA with the control software in the loop) then replays the most informative cases with photorealistic sensors and closed-loop actuation. Both layers log the same KPIs, so results are comparable and can be promoted to track tests when warranted. This division of labor is central to scaling scenario space while maintaining end-to-end realism for planning and control behaviors like cut-in/out, overtaking, and lane changes.
{{ :en:safeav:ctrl:low-high_fildelity_simulator.png?400 |Low and High Fidelity Simulators}}
Fidelity of AV simulation: a) Low-Fidelity SUMO simulator((Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-
Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wag-
ner, and Evamarie Wießner. Microscopic traffic simulation using sumo. In The 21st
IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.)) b) High-Fidelity AWSIM
simulator ((Autoware Foundation. TIER IV AWSIM. https://github.com/tier4/AWSIM,
2022.))
Formal methods strengthen this flow. In the simulation-to-track pipeline, scenarios and safety properties are specified formally (e.g., via Scenic and Metric Temporal Logic), falsification synthesizes challenging test cases, and a mapping executes those cases on a closed track((Fremont, Daniel J., et al. "Formal scenario-based testing of autonomous vehicles: From simulation to the real world." 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020.)). In published evidence, a majority of unsafe simulated cases reproduced as unsafe on track, and safe cases mostly remained safe—while time-series comparisons (e.g., DTW, Skorokhod metrics) quantified the sim-to-real differences relevant to planning and control. This is exactly the kind of transferability and measurement discipline a planning/control safety argument needs.
Finally, environment twins are built from aerial photogrammetry and point-cloud processing (with RTK-supported georeferencing), yielding maps and 3D assets that match the real campus, so trajectory-level decisions (overtake, yield, return-to-lane) are evaluated against faithful road geometries and occlusion patterns((Pikner, Heiko, et al. "Autonomous Driving Validation and Verification Using Digital Twins." VEHITS (2024): 204-211.)).
===== Methods and Metrics for Planning & Control =====
Mission-level planning validation starts from a start–goal pair and asks whether the vehicle reaches the destination via a safe, policy-compliant trajectory. Your platform publishes three families of evidence: (i) trajectory-following error relative to the global path; (ii) safety outcomes such as collisions or violations of separation; and (iii) mission success (goal reached without violations). This couples path selection quality to execution fidelity.
At the local planning level, your case study focuses on the planner inside the autonomous software. The planner synthesizes a global and a local path, then evaluates them based on predictions from surrounding actors to select a safe local trajectory for maneuvers such as passing and lane changes. By parameterizing scenarios with variables such as the initial separation to the lead vehicle and the lead vehicle’s speed, you create a grid of concrete cases that stress the evaluator’s thresholds. The outcomes are categorized by meaningful labels—Success, Collision, Distance-to-Collision (DTC) violation, excessive deceleration, long pass without return, and timeout—so that planner tuning correlates directly with safety and comfort.
{{ :en:safeav:ctrl:trajectory_validation.png?300 |Trajectory Validation}}
Trajectory validation example
Control validation links perception-induced delays to braking and steering outcomes. Your framework computes Time-to-Collision (Formula) along with the simulator and AV-stack response times to detected obstacles. Sufficient response time allows a safe return to nominal headway; excessive delay predicts collision, sharp braking, or planner oscillations. By logging ground truth, perception outputs, CAN bus commands, and the resulting dynamics, the analysis separates sensing delays from controller latency, revealing where mitigation belongs (planner margins vs. control gains).
A necessary dependency is localization health. Your tests inject controlled GPS/IMU degradations and dropouts through simulator APIs, then compare expected vs. actual pose per frame to quantify drift. Because planning and control are sensitive to absolute and relative pose, this produces actionable thresholds for safe operation (e.g., maximum tolerated RMS deviation before reducing speed or restricting maneuvers).
Finally, your program extends to low-level control via HIL-style twins. A Simulink-based network of virtual ECUs and data buses sits between Autoware’s navigation outputs and simulator actuation. This lets you simulate bus traffic, counters, and checksums; disable subsystems (e.g., steering module) to provoke graceful degradation; and compare physical ECUs against their twin under identical inputs to detect divergence. It is an efficient route to validating actuator-path integrity without building a full physical rig.
*/