Differences

This shows you the differences between two versions of the page.

--- en:safeav:as:vvintro [2025/06/16 04:42] – ToDo checked: rahulrazdan
+++ en:safeav:as:vvintro [2026/04/08 11:06] (current) – raivo.sell
@@ Line 3: / Line 3: @@
 <todo @rahulrazdan #rahulrazdan:2025-06-16></todo>
+As discussed in the governance module, whatever value products provide to their consumers is weighed against the potential harm caused by the product, and leads to the concept of legal product liability. From a product development perspective, the combination of laws, regulations, legal precedence form the overriding governance framework around which the system specification must be constructed [3].   The process of validation ensures that a product design meets the user's needs and requirements, and verification ensures that the product is built correctly according to design specifications.
+{{:en:safeav:as:picture1.png?400|}}
+Fig. 1. V&V and Governance Framework.
+The Master V&V(MaVV) process needs to demonstrate that the product has been reasonably tested given the reasonable expectation of causing harm. It does so using three important concepts [4]:
+  - Operational Design Domain (ODD): This defines the environmental conditions and operational model under which the product is designed to work.
+  - Coverage: This defines the completeness over the ODD to which the product has been validated.
+  - Field Response: When failures do occur, the procedures used to correct product design shortcomings to prevent future harm.
+As figure 1 shows, the Verification & Validation (V&V) process is the key input into the governance structure which attaches liability, and per the governance structure, each of the elements must show “reasonable due diligence.”  An example of unreasonable ODD would be for an autonomous vehicle to give up control a millisecond before an accident.
+{{:en:safeav:as:picture2.png?400|}}
+Fig. 2. Execution is space.
+Mechanically, MaVV is implemented with a Minor V&V (MiVV) process consisting of:
+  - Test Generation: From the allowed ODD, test scenarios are generated.
+  - Execution:  This test is “executed” on the product under development. Mathematically, a functional transformation which produces results.
+  - Criteria for Correctness: The results of the execution are evaluated for success or failure with a crisp criteria-for-correctness.
+In practice, each of these steps can have quite a bit of complexity and associated cost. Since the ODD can be a very wide state space, intelligently and efficiently generating the stimulus is critical. Typically, in the beginning, stimulus generation is done manually, but this quickly fails the efficiency test in terms of scaling. In virtual execution environments, pseudo-random directed methods are used to accelerate this process. In limited situations, symbolic or formal methods can be used to mathematically carry large state spaces through the whole design execution phase. Symbolic methods have the advantage of completeness but face algorithmic computational explosion issues as many of the operations are NP-Complete algorithms.
+{{:en:safeav:as:zalazone_drone_0.jpg?400|}}
+The execution stage can be done physically (such as test track above), but this process is expensive, slow, has limited controllability and observability, and in safety critical situations, potentially dangerous. In contrast, virtual methods have the advantage of cost, speed, ultimate controllability and observability, and no safety issues. The virtual methods also have the great advantage of performing the V&V task well before the physical product is constructed. This leads to the classic V chart shown in figure 1. However, since virtual methods are a model of reality, they introduce inaccuracy into the testing domain while physical methods are accurate by definition. Finally, one can intermix virtual and physical methods with concepts such as Software-in-loop or Hardware-in-loop.
+The observable results of the stimulus generation are captured to determine correctness. Correctness is typically defined by either a golden model or an anti-model.  The golden model, typically virtual, offers an independently verified model whose results can be compared to the product under test.  Even in this situation, there is typically a divergence between the abstraction level of the golden model and the product which must be managed. Golden model methods are often used in computer architectures (ex ARM, RISCV). The anti-model situation consists of error states which the product cannot enter, and thus the correct behavior is the state space outside of the error states. An example might be in the autonomous vehicle space where an error state might be an accident or violation of any number of other constraints.
+The MaVV consists of building a database of the various explorations of the ODD state space, and from that building an argument for completeness. The argument typically takes the nature of a probabilistic analysis. After the product is in the field, field returns are diagnosed, and one must always ask the question: Why did not my original process catch this issue? Once found, the test methodology is updated to prevent issues with fixes going forward. The V&V process is critical in building a product which meets customer expectations and documents the need for "reasonable" due diligence for the purposes of product liability in the governance framework.
+In most cases, the generic V&V process must grapple with massive ODD spaces, limited execution capacity, and high cost of evaluation. Further, all of this must be done in a timely manner to make the product available to the marketplace. Traditionally, the V&V regimes have been bifurcated into two broad categories: Physics- Based and Decision-Based. We will discuss the key characteristics of each now.
+**Physics-Based Operating Domains**
+For MaVV, the critical factors are the efficiency of the MiVV “engine” and the argument for the completeness of the validation. Historically, mechanical/non-digital products (such as cars or airplanes) required sophisticated V&V. These systems were examples of a broader class of products which had a Physics-Based Execution (PBE) paradigm. In this paradigm, the underlying model execution (including real life) has the characteristics of continuity and monotonicity because the model operates in the world of physics. This key insight has enormous implications for V&V because it greatly constrains the potential state-space to be explored. Examples of this reduction of state-space include:
+ - Scenario Generation: One needs only worry about the state space constrained by the laws of physics. Thus, objects which obey physics cannot exist. Every actor is explicitly constrained by the laws of physics.
+  - Monotonicity: In many interesting dimensions, there are strong properties of monotonicity. As an example, if one is considering stopping distance for braking, there is a critical speed above which there will be an accident.
+Critically, all the speed bins below this critical speed are safe and do not have to be explored. Mechanically, in traditional PBE fields, the philosophy of safety regulation (ISO 26262 [5], AS9100 [6], etc.) builds the safety framework as a process, where
+  - failure mechanisms are identified;
+  - a test and safety argument is built to address the failure mechanism;
+  - there is a safety process by a regulator (or documentation for self-regulation) which evaluates these two and acts as a judge to approve/decline.
+Traditionally, faults considered are primarily mechanical failure. As an example, the flow for validating the braking system in an automobile through ISO 26262 would have the following steps:
+  - Define Safety Goals and Requirements (Concept Phase): Hazard Analysis and Risk Assessment (HARA): Identify potential hazards related to the braking system (e.g., failure to stop the vehicle, unintended braking). Assess risk levels using parameters like severity, exposure, and controllability. Define Automotive Safety Integrity Levels (ASIL) for each hazard (ranging from ASIL A to ASIL D, where D is the most stringent). Define safety goals to mitigate hazards (e.g., ensure sufficient braking under all conditions).
+  - Develop Functional Safety Concept: Translate safety goals into high-level safety requirements for the braking system. Ensure redundancy, diagnostics, and fail-safe mechanisms are incorporated (e.g., dual-circuit braking or electronic monitoring).
+  - System Design and Technical Safety Concept: Break down functional safety requirements into technical requirements, design the braking system with safety mechanisms like hardware (e.g., sensors, actuators) and software (e.g., anti-lock braking algorithms). Implement failure detection and mitigation strategies (e.g., failover to mechanical or electronic control paths).
+  - Hardware and Software Development: Hardware Safety Analysis (HSA): Validate that components meet safety standards (e.g., reliable braking sensors). Software Development and Verification: Use ISO 26262-compliant processes for coding, verification, and validation. Test braking algorithms under various conditions.
+  - Integration and Testing: Perform verification of individual components and subsystems to ensure they meet technical requirements. Conduct integration testing of the complete braking system, focusing on functional tests (e.g., stopping distance), safety tests (e.g., behavior under fault conditions), and stress/environmental tests (e.g., heat, vibration).
+  - Validation (Vehicle Level): Validate the braking system against safety goals defined in the concept phase. Perform real-world driving scenarios, edge cases, and fault injection tests to confirm safe operation. Verify compliance with ASIL-specific requirements.
+  - Production, Operation, and Maintenance: Ensure production aligns with validated designs. Implement operational safety measures (e.g., periodic diagnostics, maintenance), monitor and address safety issues during the product lifecycle (e.g., software updates).
+  - Confirmation and Audit: Use independent confirmation measures (e.g., safety audits, assessment reviews) to ensure the braking system complies with ISO 26262.
+<note important>2x Finally, seems like different chapter</note>
+Finally, the regulations have a strong idea of safety levels with Automotive Safety Integrity Levels (ASIL). Airborne systems follow a similar trajectory (pun intended) with the concept of Design Assurance Levels (DALs). A key part of the V&V task is to meet the standards required at each ASIL level. Historically, a sophisticated set of V&V techniques has been developed to verify traditional automotive systems. These techniques included well-structured physical tests, often validated by regulators, or sanctioned independent companies (ex TUV-Sud [7]). Over the years, the use of virtual physics-based models has increased to model design tasks such as body design [8] or tire performance [9]. The general structure of these models is to build  a simulation which is predictive of the underlying physics to enable broader ODD exploration. This creates a very important characterization, model generation, predictive execution, and correction flow. Finally, because the execution is highly constrained by physics, virtual simulators can have limited performance and often require extensive hardware support for simulation acceleration. In summary, the key underpinnings of the PBE paradigm from a V&V point of view are:
+  - Constrained and well-behaved space for scenario test generation.
+  - Expensive physics-based simulations.
+  - Regulations focused on mechanical failure.
+  - In safety situations, regulations focused on a process to demonstrate safety with a key idea of design assurance levels.
+TRADITIONAL DECISION-BASED EXECUTION
+As cyber-physical systems evolved, information technology (IT) rapidly transformed the world.
+{{:en:safeav:as:figure2.jpg?700|}}
+Fig. 4. Progression of System Specification (HW, SW, AI).
+As shown in Figure 4, within electronics, there has been a progression of system function construction where the first stage was hardware or pseudo-hardware (FPGA, microcode). The next stage involved the invention of a processor architecture upon which software could imprint system function. Software was a design artifact written by humans in standard languages (C, Python, etc.). The revolutionary aspect of the processor abstraction allowed a shift in function without the need to shift physical assets. However, one needed legions of programmers to build the software. Today, the big breakthrough with Artificial Intelligence (AI) is the ability to build software with the combination of underlying models, data, and metrics.
+In their basic form, IT systems were not safety critical, and the similar levels of legal liability have not attached to IT products. However, the size and growth of IT is such that problems in large volume consumer products can have catastrophic economic consequences [10]. Thus, the V&V function was very important. IT systems follow the same generic processes for V&V as outlined above, but with two significant differences around the execution paradigm and source of errors. First, unlike the PBE paradigm, the execution paradigm of IT follows a Decision Based Execution mode (DBE). That is, there are no natural constraints on the functional behavior of the underlying model, and no inherent properties of monotonicity. Thus, the whole massive ODD space must be explored which makes the job of generating tests and demonstrating coverage extremely difficult. To counter this difficulty, a series of processes have been developed to build a more robust V&V structure. These include: 1) Code Coverage: Here, the structural specification of the virtual model is used as a constraint to help drive the test generation process. This is done with software or hardware (RTL code). 2) Structured Testing: A process of component, subsection, and integration testing has been developed to minimize propagation of errors. 3) Design Reviews: Structured design reviews with specs and core are considered best practice.
+A good example of this process flow is the CMU Capability Maturity Model Integration (CMMI) [11] which defines a set of processes to deliver quality software. Large parts of the CMMI architecture can be used for AI when AI is replacing existing SW components. Finally, testing in the DBE domain decomposes into the following philosophical categories: “Known knowns:” Bugs or issues that are identified and understood, “Known unknowns” Potential risks or issues that are anticipated but whose exact nature or cause is unclear, and “Unknown unknowns” Completely unanticipated issues that emerge without warning, often highlighting gaps in design, understanding, or testing. The last category being the most problematic and most significant for DBE V&V. Pseudo-random test generation has been a key technique used as a method to expose this category [12]. In summary, the key underpinnings of the DBE paradigm from a V&V point of view are: 1) Unconstrained and not well-behaved execution space for scenario test generation, 2) Generally, less expensive simulation execution (no physical laws to simulate), 3) V&V focused on logical errors not mechanical failure 4) Generally, no defined regulatory process for safety critical applications. Most software is “best efforts,” 5) “Unknown-unknowns” a key focus of validation.
+**A key implication of the DBE space is that the idea from the PBE world of building a list of faults and building a safety argument for them is antithetical to the focus of DBE validation.**
+<note important>2x Finally, seems like different chapter</note>
+Finally, the product development process is typically focused on defining an ODD and validating against that situation. However, in modern times, an additional concern is that of adversarial attacks (cybersecurity). In this situation, an adversary wants to high jack the system for nefarious intent. In this situation, the product owner must not only validate against the ODD, but also detect when the system is operating outside the ODD. After detection, the best case scenario is to safely redirect the system to the ODD space. The risk associated with cybersecurity issues typically split at three levels for cyber-physical systems:
+  - OTA Security:  If an adversary can take manipulate the Over the Air (OTA) software updates, they can take over mass number of devices quickly.  An example worst case situation would be a Tesla OTA which turns Tesla's into collision engines.
+  - Remote Control Security:  If the adversary can take over a car remotely, they can cause harm to the occupants as well as third-parties.
+  - Sensor Spoofing:  In this situation, the adversary uses local physical assets to fool the sensors of the target. GPS jamming or spoofing are active examples.
+In terms of governance, some reasonable due-diligence is expected to be provided by the product developer in order to minimize these issues. The level of validation required is dynamic in nature and connected to the norm in the industry.

en/safeav/as/vvintro.1750038121.txt.gz · Last modified: 2025/06/16 04:42 by rahulrazdan