====== Test Methods ====== Scenario design tells us what should be tested. Test methods determine how those scenarios are executed, what kind of evidence is collected, and how the results are translated into a validation argument. For planning and control, this distinction matters because the same concrete scenario can be exercised in several different ways: first in simulation, then in software-in-the-loop or hardware-in-the-loop settings, then on a controlled test track, and finally in a monitored real-world environment. The book already follows this logic in its current material, where physical testing, real-world seeding, and virtual testing are treated as three complementary ways of generating and executing tests, each with different strengths and limitations. The central idea is that the scenario from the previous subsection must be brought into a test environment that is suitable for the question being asked. A lane-change scenario, for example, may first be explored in CARLA or another simulator, then repeated with software-in-the-loop or hardware-in-the-loop components, then confirmed at a controlled proving ground such as ZalaZONE, and finally monitored in limited real-world operation. In each case, the test method changes, but the underlying scenario remains the same. That is what makes the validation evidence comparable. {{ :en:safeav:ctrl:testing_methods.png?600 |Three main testing methods}} ===== Simulation-Based Testing ===== Simulation is the first and most flexible execution method. It allows the team to test a large number of scenario variants quickly, safely, and repeatably. This is especially important for planning and control, because the behavior of these modules depends strongly on speed, spacing, road geometry, actor behavior, and timing. A simulator can sweep those parameters systematically and expose boundary cases that would be too risky or too expensive to reproduce physically. The current book already contains a strong simulation toolbox, and that material should be used directly here. For ground systems, CARLA is a natural open-source choice for academic and research work because it supports realistic urban scenes and sensor stacks. NVIDIA DRIVE Sim is useful when the goal is GPU-accelerated synthetic data and digital-twin style validation. IPG CarMaker, dSPACE ASM, VIRES VTD, Applied Intuition, Cognata, and MathWorks tools can be used when the focus shifts toward closed-loop vehicle dynamics, scenario coverage, or industrial validation workflows. These platforms are not identical, and that is part of the point: some are better for scenario breadth, some for sensor realism, some for controller validation, and some for integration with SIL and HIL. For planning and control, simulation is especially useful when the test objective is one of the following: - checking whether the planner generates a safe trajectory across many parameter combinations; - checking whether the controller remains stable under delay, friction change, or uncertain motion; - identifying which scenario parameters drive unsafe or uncomfortable behavior; - replaying rare or dangerous edge cases without exposing people or equipment to risk. {{ :en:safeav:ctrl:low-high_fildelity_simulator.png?400 |}} A practical way to use simulation is to split it into two layers. Low-fidelity simulation is used first to sweep large scenario spaces quickly and identify where safety margins begin to tighten. High-fidelity simulation is then used for the most important cases, where sensor realism, closed-loop dynamics, and timing behavior matter more. The book’s current material already describes this logic: low-fidelity simulation is useful for broad exploration, while high-fidelity simulation is used to replay informative cases with more realism and to connect the result to later track testing. ===== Software-in-the-Loop and Hardware-in-the-Loop ===== Simulation becomes more valuable when the actual autonomy stack is connected to it. In software-in-the-loop testing, the real planning or control software runs inside the virtual environment. This is useful because it tests the actual code while keeping the physical risk low. If the software produces the wrong maneuver, the wrong timing, or the wrong fallback action, the error can be observed in a safe and repeatable setting. Hardware-in-the-loop adds another layer of realism. It places real or representative hardware into the loop, such as ECUs, data buses, actuator interfaces, or timing elements. This is particularly important for planning and control, because the question is often not only whether the algorithm is correct, but whether the command reaches the vehicle correctly and on time. A planner that works in software may still fail once the actuation path, timing jitter, or bus communication is introduced. The current manuscript already gives a good example of this in the discussion of virtual ECUs and data buses, where the test rig can simulate bus traffic, counters, checksums, subsystem failures, and graceful degradation. That material fits naturally here because it shows how HiL-style twins help validate actuator-path integrity without requiring a full physical rig. ===== Test Tracks and Physical Testing ===== Test tracks are the bridge between simulation and real-world operation. They provide physical realism while preserving a controlled environment in which scenarios can be repeated, instrumented, and compared. This makes them ideal for confirming whether a scenario that worked in simulation also behaves correctly on a vehicle with real dynamics, real sensing, and real timing. One of the ground-systems test track examples is ZalaZONE in Hungary. ZalaZONE includes a Smart City Zone, highway and rural sections, a high-speed oval, dynamic platform, wet and dry handling courses, off-road areas, and V2X/5G infrastructure. It also supports simulation and digital-twin integration through tools such as IPG CarMaker and AVL, making it especially useful for SIL and HIL validation alongside physical track tests. {{ :en:safeav:ctrl:figure6.12a.jpg?400 |}} Test-track validation is particularly suitable for: - lane changes and overtaking; - cut-in and cut-out events; - obstacle avoidance; - stop and yield behavior; - emergency braking; - localization disturbance checks; - controller timing and actuation tests. The strength of a test track is controllability. The same maneuver can be repeated under carefully defined conditions, and the result can be compared against the corresponding simulation case. This makes it possible to isolate whether an unsafe outcome came from the scenario itself, the planner, the controller, the localization path, or the actuation behavior. The chapter should also keep the existing infrastructure discussion on sensor and EMC testing, because that supports the broader idea of physical validation. Anechoic chambers, fully anechoic chambers, semi-anechoic chambers, RF-shielded rooms, and reverberation chambers are important when sensor behavior, electromagnetic interference, and communication robustness need to be measured under controlled conditions. That content belongs here because planning and control depend on the quality and timing of the sensing stack, and sensing validation is part of what makes the test result credible. ===== Real-World Testing and Real-World Seeding ===== Real-world testing is the most demanding method because it captures the actual operational environment. It should therefore be used after the system has already shown acceptable behavior in simulation and on the track. The goal is not to replace simulation or track testing, but to confirm that the validated behavior survives contact with the real world. The current material gives a useful distinction that should be preserved: one line of validation uses real-world experience as the starting point for further virtual testing, while another line uses the fleet or field itself as a large distributed testbed. The Tesla-style fleet approach is a good example of the first case, where data from the field is fed into a large-scale validation pipeline. Pegasus and the Warwick-related scenario database are good examples of the second, where observed situations are turned into reusable validation material. OpenSCENARIO 2.0 also belongs here because it supports symbolic, reproducible scenario generation based on structured descriptions rather than ad hoc test notes. This section is also the right place to mention that test generation can be seeded by observed events. Real-world seeding is valuable because it gives the team real situations instead of purely synthetic ones. However, completeness is still an open issue, and there is always a risk that the collected database overrepresents familiar or already-seen conditions. That is why seeding should be treated as a source of test diversity, not as a complete validation solution. Real-world testing is most useful when the question is: - does the system remain safe in its intended ODD? - does the planner and controller remain stable under actual traffic and infrastructure conditions? - do the simulation and track assumptions still hold once the system is deployed? ===== Choosing the Right Method ===== The test method should follow the validation question. ^ Validation question ^ Best method to start with ^ | Can the planner produce a safe trajectory across many parameter combinations? | Simulation | | Does the real software behave correctly in the virtual world? | Software-in-the-loop | | Does timing, communication, and actuator integration work correctly? | Hardware-in-the-loop | | Does the system behave correctly under controlled physical conditions? | Test track | | Does the system remain safe in the intended operating environment? | Real-world testing | This is not a rigid ladder. In practice, validation moves back and forth between methods. A track failure may lead to changes in the scenario model or the simulator. A simulation failure may lead to a revised controller or a narrower ODD. A real-world failure may lead to a new safety margin, a changed fallback rule, or a better test-track reproduction of the same case. The important point is that each method contributes a different kind of evidence. Simulation gives scale, SiL and HiL give integration realism, test tracks give controlled physical confirmation, and real-world testing gives operational credibility. For planning and control, a credible validation strategy needs all of them, with the scenario from the previous subsection serving as the common reference across the different execution environments. ===== Evidence Produced by Testing ===== The results of planning and control testing should be recorded in a form that can be compared across methods and reused in the safety argument. The most useful evidence is: - trajectory error; - tracking error; - Time-to-Collision and Distance-to-Collision; - collision or near-miss events; - maneuver completion time; - planner latency; - controller response delay; - jerk and acceleration comfort measures; - safe fallback activation; - repeatability across similar runs. These outputs should be interpreted together. A maneuver that is accurate but unsafe is not acceptable. A maneuver that is safe but erratic may also be unacceptable if it creates instability or poor comfort. The validation report should therefore link each result back to the scenario definition, the test method, and the original system requirement. The role of this subsection is to turn scenarios into evidence. Simulation, track testing, and real-world testing are not competing methods; they are complementary layers of the same validation strategy. Simulation gives breadth, physical test tracks give controlled confirmation, and real-world operation gives the strongest form of deployment evidence. The next subsection can now focus on how these results are packaged into a validation argument and how they support the chapter summary. /* ---- commented ----- ====== Physical Testing ====== Physical testing infrastructures across ground, airborne, marine, and space systems reflect a progression from **high-access, repeatable environments** to **extremely constrained, high-cost, and often non-replicable conditions**. Each domain builds specialized facilities to bridge the gap between simulation and real-world deployment, with increasing emphasis on safety, controllability, and observability of complex system interactions. ===== Ground Systems (Automotive & Robotics) =====
{{:en:safeav:ctrl:figure6.12a.jpg?600|}} AV test tracks
Ground systems benefit from the most accessible and diverse physical testing environments. **Proving grounds and AV test tracks**—such as Mcity and American Center for Mobility—replicate urban, suburban, and highway conditions with controllable variables (traffic signals, pedestrian dummies, weather systems). OEMs also use large private facilities (e.g., General Motors Milford Proving Ground) for durability, ADAS, and edge-case testing. These environments enable **repeatable scenario testing**, fault injection, and safe validation of perception and decision-making systems. Increasingly, they are instrumented with high-precision localization, V2X infrastructure, and synchronized data capture to support validation at scale. ===== Airbone Systems (Aviation & UAVs) =====
{{:en:safeav:ctrl:figure6.12b.jpg?600|}} Airbone Systems (Aviation & UAVs)
Airborne testing combines **ground-based facilities and open-air test ranges**. Wind tunnels (e.g., NASA Ames Research Center Wind Tunnel) provide controlled aerodynamic testing across regimes, while **iron-bird rigs** and avionics labs enable hardware/software integration before flight. Actual flight testing occurs at restricted ranges such as Edwards Air Force Base or FAA-designated UAV corridors, where telemetry, radar tracking, and chase aircraft ensure safety. Compared to ground systems, **repeatability is lower**, and environmental factors (weather, airspace constraints) play a larger role, but the combination of lab + flight test provides a structured certification pathway.
{{:en:safeav:ctrl:figure6.12c.jpg?600|}} Marine Systems (Surface & Underwater)
Marine testing relies on a mix of **controlled hydrodynamic facilities and open-water trials**. Towing tanks and wave basins—such as those at Naval Surface Warfare Center—allow precise study of hull performance, propulsion, and wave interaction. For autonomy, sheltered environments (harbors, test lakes) are used for early-stage validation, followed by coastal and deep-sea trials. Facilities often include instrumented buoys, GPS-denied navigation testing zones, and long-duration endurance setups. Compared to ground and air, marine systems emphasize **disturbance realism (waves, currents)** and **long-horizon reliability**, with less focus on dense, repeatable interaction scenarios.
{{:en:safeav:ctrl:figure6.12d.jpg?600|}} Space Systems (Launch, Orbital, Deep Space
Space systems have the most specialized and constrained physical testing infrastructure. Because full end-to-end testing in the operational environment is impossible, engineers rely on **high-fidelity ground facilities** that replicate aspects of space conditions. These include thermal vacuum chambers (e.g., NASA Johnson Space Center Chamber A), vibration and acoustic test facilities for launch loads, and propulsion test stands (e.g., Stennis Space Center). RF anechoic chambers validate communication and sensing systems. While these facilities achieve extreme fidelity for specific physics, **system-level validation is fragmented**, requiring heavy reliance on simulation and incremental subsystem testing. The cost and irreversibility of failure drive a test philosophy centered on qualification, redundancy, and conservative margins. ===== Cross-Domain Insight ===== Across all four domains, physical testing evolves from **highly repeatable, scenario-rich environments (ground)** to **physics-constrained, partial-reality validation (space)**. Airborne and marine systems sit in between, blending controlled facilities with real-world trials. A consistent trend is the integration of **instrumented test environments with digital twins**, enabling bidirectional feedback between physical experiments and simulation models—an increasingly critical capability for validating autonomous and safety-critical systems. ===== Testing Infrastructure ===== From Ch. 8.1 Integrate to the chapter As discussed earlier, generic V&V process consists of testing the product under test within the ODD. This is generally done with a number of techniques. The central paradigm is to generate a test, execute the test, and have a clear criteria for correctness. Three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing. - Physical Testing :Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the global verification flow can be managed by large databases and supercomputers (DoJo) . By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes. - Real-World Seeding: Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms. - Virtual Testing: Another important contribution was ASAM OpenSCENARIO 2.0 which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors. Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, "Standard for Safety for the Evaluation of Autonomous Products." Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions. What kind of testing infrastructure is required to execute on these various methodologies ? The baseline for automotive physical testing are facilities for crash testing, road variations, and weather effects. These are generally in private and shared test tracks around the world. For autonomy, several levels of test infrastructure have emerged around the topics of sensors, test tracks, and virtual simulation. {{:en:safeav:avt:3m-anechoic-chamber.jpg?600|}} Figure: Anechoic Chamber For sensors, important equipment includes: - Anechoic Chambers: These chambers are characterized by their anechoic (echo-free) interior, meaning they are designed to completely absorb sound or electromagnetic waves to eliminate reflections from the walls, ceiling, and sometimes the floor. - Fully Anechoic Chambers (FAC): These chambers have all interior surfaces (walls, ceiling, and floor) covered with RF absorbing materials, creating an environment free from reflections. They are ideal for high-precision measurements like antenna testing or situations where a free-space environment is needed. - Semi-Anechoic Chambers (SAC): In this type, the walls and ceiling are covered with absorbing materials, while the floor remains reflective (often a metal ground plane). This reflective floor helps simulate real-world environments, such as devices operating on the ground. Semi-anechoic chambers are commonly used for general EMC (Electromagnetic Compatibility) testing. - RF Shielded Rooms (Faraday Cages): These are enclosed rooms designed to block the entry or exit of electromagnetic radiation. They are constructed with a conductive shield (typically copper or other metals) around the walls, ceiling, and floor, minimizing the entry or exit of electromagnetic interference (EMI). They are a fundamental component of many EMI testing facilities. - Reverberation Chambers: These chambers intentionally use resonances and reflections within the chamber to create a statistically uniform electromagnetic field. They can accommodate larger and more complex test setups and are particularly useful for immunity testing where the device is exposed to interference from all directions. However, their performance can be limited at lower frequencies. {{:en:safeav:avt:zalazone_drone_0.jpg?600|}} Figure: Zalazone Autonomous Test Track In terms of test tracks, traditional test tracks which were used for purposes for mechanical testing have been extended for testing autonomy functions. A recent example shown in the figure above is ZalaZONE, a large test track located in Hungary. ZalaZONE integrates both conventional vehicle testing infrastructure and next-generation smart mobility features. One of its standout components is the Smart City Zone, which simulates real-world urban environments with intersections, roundabouts, pedestrian crossings, and public transport scenarios. This allows for comprehensive testing of urban-level autonomy, V2X communication, and AI-driven mobility solutions in a controlled yet realistic environment. The facility includes a dedicated highway and rural road section to support the evaluation of higher-speed autonomous functions such as adaptive cruise control, lane-keeping, and safe overtaking. A high-speed oval enables long-duration endurance testing and consistent-speed trials for autonomous or connected vehicles. The dynamic platform provides a flat, open space for vehicle dynamics testing, such as automated emergency braking, evasive maneuvers, and trajectory planning, while both wet and dry handling courses allow for testing on varied friction surfaces under critical scenarios. ZalaZONE is also equipped with advanced V2X and 5G infrastructure, including roadside units (RSUs) and edge computing systems, to enable real-time communication and data exchange between vehicles and infrastructure—critical for cooperative driving and sensor validation. Additionally, an off-road section supports testing for SUVs, military vehicles, and trucks in rough terrain conditions. The facility is complemented by EMC testing capabilities and plans for climate-controlled testing chambers, enhancing its support for environmental and regulatory testing. ZalaZONE also provides integration with simulation and digital twin environments. Through platforms such as IPG CarMaker and AVL tools, developers can carry out software-in-the-loop (SIL) and hardware-in-the-loop (HIL) testing in parallel with on-track validation. {{:en:safeav:avt:carla.jpg?600|}} Figure: Carla Simulator Finally, a great deal of simulation is done virtually. Simulation plays a critical role in the development and validation of autonomous vehicles (AVs), allowing developers to test perception, planning, and control systems in a wide range of scenarios without physical risk. Among the most prominent tools is CARLA, an open-source simulator built for academic and research use, known for its realistic urban environments, support for various sensors (LiDAR, radar, cameras), and integration with ROS. It’s widely adopted for prototyping and reinforcement learning in AVs. In the commercial space, "rFpro" is a leading choice for OEMs and Tier-1 suppliers, offering photorealistic environments and precise sensor modeling with sub-millimeter accuracy—essential for validating sensor fusion algorithms. Similarly, "IPG CarMaker" and "dSPACE ASM" provide powerful closed-loop environments ideal for testing vehicle dynamics and ADAS features, especially in hardware-in-the-loop (HIL) and software-in-the-loop (SIL) setups. These tools are tightly integrated with MATLAB/Simulink and real-time hardware for embedded control testing. For large-scale and safety-critical simulations, platforms like "VIRES VTD" and "Applied Intuition" are favored due to their compliance with industry standards like ASAM OpenX and ISO 26262, and their ability to model thousands of edge-case scenarios. "NVIDIA DRIVE Sim", built on the Omniverse platform, is used to generate synthetic data for training and validating neural networks and digital twins, offering GPU-accelerated realism that aids perception system testing. Finally, simulators like "Cognata" and "MathWorks' Automated Driving Toolbox" serve niche but critical roles—Cognata provides city-scale environments for scenario testing and safety validation, while MathWorks' tools are widely used for algorithm development and control prototyping, especially in academia and early-stage design. Each simulator has a specific focus—some prioritize sensor realism, others full-system integration or large-scale scenario generation—so selection depends on whether the goal is research, real-time control testing, or safety validation for deployment. */