Differences

This shows you the differences between two versions of the page.

Link to this comparison view

--- en:safeav:maps:validation [2025/10/23 23:36] – [Localization Validation] momala
+++ en:safeav:maps:validation [2026/04/29 16:32] (current) – raivo.sell
@@ Line 1: / Line 1: @@
 ====== Validation Approaches ======
-{{:en:iot-open:czapka_m.png?50| Masters (2nd level) classification icon }}
-<todo @bertlluk>CTU: Help needed. We don't know what the intent was behind this chapter.</todo>
+Having designed a sensor, object recognition, and location services section,  how does one test these components.  The fundamentals are consistent with the discussions in chapter 2.  One defines an ODD, builds tests underneath this ODD, applies these tests, and determines correctness. The application of the tests can virtual (simulation), physical (test track), or even a mix based on components (Hardware in Loop or Software in Loop).  The population of tests needs to be complete enough to show sufficient coverage.  The introduction of sensors and AI add significant complexity to this process.
-This section presents a practical, simulation-driven approach to validating the perception, mapping (HD maps/digital twins), and localization layers of an autonomous driving stack. The core idea is to anchor tests in the operational design domain (ODD), express them as reproducible scenarios, and report metrics that connect module-level behavior to system-level safety.
+Testing sensors in safety-critical systems is particularly challenging when viewed through the lens of verification, validation (V&V), and certification, because sensors are both hardware devices and context-dependent measurement systems. Verification—ensuring the sensor meets its design specifications—can be addressed through laboratory calibration, environmental stress testing, and compliance with standards such as ISO 16750 (environmental conditions), DO-160 (avionics), and MIL-STD-810 (defense systems). However, validation—ensuring the sensor performs adequately in real operational contexts—is far more complex. Sensor performance depends heavily on the operational design domain (ODD), including weather, lighting, clutter, and interference conditions, which are difficult to fully replicate or bound. This gap between controlled verification and real-world validation is especially acute for perception sensors (e.g., cameras, radar, lidar), where performance is probabilistic rather than deterministic and strongly influenced by environmental variability.  Today, there is a great deal of innovation in mechanical test apparatus which mimic physical movement inside Anechoic Chambers   to recreate difficult test scenarios.  In the outdoor environment, hives of drones as EM sensors and producers of noise provide a similar function for test tracks.
-====== Scope, ODD, and Assurance Frame ======
+^ Conventional Algorithm ^ ML Algorithm ^ Comment ^
+| Logical Theory | No Theory | In conventional algorithms, one needs a theory of operation to implement the solution. ML algorithms can often work without a clear understanding of exactly why they work. |
+| Analytical | Not Analytical | Conventional algorithms are accurate in a way we can understand; however, ML algorithms are not easily understood and often behave like a "black box." |
+| Causal | Correlation | Conventional algorithms focus on causality, while ML algorithms discover correlations. The difference is important if one wants to reason at higher levels. |
+| Deterministic | Non-Deterministic | Conventional algorithms are deterministic in nature, and ML algorithms are fundamentally probabilistic in nature. |
+| Known Computational Complexity | Unknown Computational Complexity | Given the analyzable nature of conventional algorithms, one can build a model for computational complexity. This is not always possible for ML techniques, which may require testing to evaluate computational complexity. |
+//**Table 1: Contrast of Conventional and Machine Learning Algorithms**//
+The introduction of AI as a replacement for traditional software introduces significant validation issues (table 1).  Significantly, the many techniques developed for testing software such as code reviews, code coverage, and static analysis tools.  Further, to test an AI component, it appears to be likely that one must test the method by which it was trained and have access to the training data.
+Safety standards across automotive, marine, airborne, and space domains are now evolving to address the introduction of AI/ML-driven functionality, shifting from purely deterministic assurance models toward data-driven and probabilistic validation frameworks. In automotive, traditional functional safety under ISO 26262 has been extended by ISO/PAS 8800 and ISO 21448 to explicitly address perception uncertainty, training data coverage, and performance limitations of AI-based systems. In aviation, guidance such as DO-178C is being supplemented by emerging frameworks like DO-387 (in development) to tackle non-deterministic behavior, explainability, and learning assurance. Similarly, space systems governed by ECSS standards and marine systems guided by International Maritime Organization frameworks are beginning to incorporate autonomy and AI considerations, particularly for unmanned and remotely operated platforms. Across all domains, a common trend is emerging safety assurance is moving from static compliance toward lifecycle-based assurance, including dataset governance, simulation-based validation, runtime monitoring, and continuous certification concepts. This reflects a fundamental shift in safety engineering—from proving correctness of fixed logic to bounding the behavior of adaptive, data-driven systems operating under uncertainty.
+This remainder of this section presents a practical, simulation-driven illustration to validating the perception, mapping (HD maps/digital twins), and localization layers of an autonomous driving stack. The core idea is to anchor tests in the operational design domain (ODD), express them as reproducible scenarios, and report metrics that connect module-level behavior to system-level safety.
+===== Scope, ODD, and Assurance Frame =====
 We decompose the stack into Perception (object detection/tracking), Mapping (HD map/digital twin creation and consistency), and Localization (GNSS/IMU and vision/LiDAR aiding) and validate each with targeted KPIs and fault injections. The evidence is organized into a safety case that explains how module results compose at system level. Tests are derived from the ODD and instantiated as logical/concrete scenarios (e.g., with a scenario language like Scenic) over the target environment. This gives you systematic coverage and reproducible edge-case generation while keeping hooks for standards-aligned arguments (e.g., ISO 26262/SOTIF) and formal analyses where appropriate.
-====== Perception Validation ======
+===== Perception Validation Illustration =====
@@ Line 25: / Line 41: @@
 Figure 1 explains object comparison. Green boxes are shown for objects captured by ground truth, while Red boxes are shown for objects detected by the AV stack. Threshold-based rules are designed to compare the objects. It is expected to provide specific indicators of detectable vehicles in different ranges for safety and danger areas.
-====== Mapping / Digital-Twin Validation ======
+===== Mapping / Digital-Twin Validation Illustration =====
 Validation begins with how the map and digital twin are produced. Aerial imagery or LiDAR is collected with RTK geo-tagging and surveyed control points, then processed into dense point clouds and classified to separate roads, buildings, and vegetation. From there, you export OpenDRIVE (for lanes, traffic rules, and topology) and a 3D environment for HF simulation. The twin should be accurate enough that perception models do not overfit artifacts and localization algorithms can achieve lane-level continuity.
@@ Line 32: / Line 48: @@
 Key checks include lane topology fidelity versus survey, geo-consistency in centimeters, and semantic consistency (e.g., correct placement of occluders, signs, crosswalks). The scenarios used for perception and localization are bound to this twin so that results can be reproduced and shared across teams or vehicles. Over time, you add change-management: detect and quantify drifts when the real world changes (construction, foliage, signage) and re-validate affected scenarios.
-====== Localization Validation ======
+===== Localization Validation Illustration =====
 Here, the focus is on the robustness of ego-pose to sensor noise, outages, and map inconsistencies. In simulation, you inject GNSS multipath, IMU bias, packet dropouts, or short GNSS blackouts and watch how quickly the estimator diverges and re-converges. Similar tests perturb the map (e.g., small lane-mark misalignments) to examine estimator sensitivity to mapping error.
@@ Line 48: / Line 64: @@
-{{ :en:safeav:maps:localization_val.png?400 |}}
+<figure Localization Validation>
-====== Multi-Fidelity Workflow and Scenario-to-Track Bridge ======
+{{ :en:safeav:maps:localization_val.png?400 | localization validation}}
+<caption> Localization validation, in some cases, the difference between the expected location and the actual location may lead to accidents.</caption>
+</figure>
+The current validation methods perform a one-to-one mapping between the expected and actual locations. As shown in Fig. 2, for each frame, the vehicle position deviation is computed and reported in the validation report. Later parameters, like min/max/mean deviations, are calculated from the same report. In the validation procedure, it is also possible to modify the simulator to embed a mechanism to add noise in the localization process to check the robustness and validate its performance.
+===== Multi-Fidelity Workflow and Scenario-to-Track Bridge =====
 A two-stage workflow balances coverage and realism. First, use LF tools (e.g., planner-in-the-loop with simplified sensors and traffic) to sweep large grids of logical scenarios and identify risky regions in parameter space (relative speed, initial gap, occlusion level). Then, promote the most informative concrete scenarios to HF simulation with photorealistic sensors for end-to-end validation of perception and localization interactions. Where appropriate, a small, curated set of scenarios is carried to closed-track trials. Success criteria are consistent across all stages, and post-run analyses attribute failures to perception, localization, prediction, or planning so fixes are targeted rather than generic.

en/safeav/maps/validation.1761251804.txt.gz · Last modified: 2025/10/23 23:36 by momala