Open issues of validating AI components

[raivo.sell][✓ raivo.sell, 2025-09-18]

A. AI COMPONENT VALIDATION Both the automotive and airborne spaces have reacted to AI by viewing it as “specialized Software” in standards such as ISO 8800 [14] and [13]. This approach has the great utility of leveraging all the past work in generic mechanically safety and past work in software validation. However, now, one must manage the issue of how to handle the fact that we have a data generated “code” vs conventional programming code. In the world of V&V, this difference is manifested in three significant aspects: coverage analysis, code reviews, and version control. TABLE III V&V Technique Software AI/ML Coverage Analysis: Code Structure provides basis of coverage No structure Code Reviews: Crowd source expert knowledge No Code to Review Version Control Careful construction/release Very Difficult with data

These differences generate an enormous issue for intelligent test generation and any argument for completeness. This is an area of active research, and two threads have emerged: 1) Training Set Validation: Since the final referenced component is very hard to analyze, one approach is to examine the training set and the ODD to find interesting tests which may expose the cracks between them [16]. 2) Robustness to Noise: Either through simulation or using formal methods [17], the approach is to assert various higher-level properties and use these to test the component. An example in object recognition might be to assert the property that an object should be recognized independent of orientation. Overall, developing robust methods for AI component validation is quite an active and unsolved research topic for “fixed” function AI components. That is, AI components where the function is changing with active version control. Of course, many AI applications prefer a model where the AI component is constantly morphing. Validating the morphing situation is a topic of future research.

B. AI SPECIFICATION

For well-defined systems with an availability of system level abstractions, AI/ML components significantly increase the difficulty of intelligent test generation. With a golden spec, one can follow a structured process to make significant progress in validation and even gate the AI results with conventional safeguards. Unfortunately, one of the most compelling uses of AI is to employ it in situations where the specification of the system is not well defined or not viable using conventional programming. In these Specification Less /ML (SLML) situations, not only is building interesting tests difficult, but evaluating the correctness of the results creates further difficulty. Further, most of the major systems (perception, location services, path planning, etc.) in autonomous vehicles fall into this category of system function and AI usage. To date, there have been two approaches to attack the lack of specification problem: Anti-Spec and AI-Driver. 1) Anti-Spec In these situations, the only approach left is to specify correctness through an anti-spec. The simplest anti-spec is to avoid accidents. Based on some initial work by Intel, there is a standard, IEEE 2846, “Assumptions for Models in Safety-Related Automated Vehicle Behavior” [18] which establishes a framework for defining a minimum set of assumptions regarding the reasonably foreseeable behaviors of other road users. For each scenario, it specifies assumptions about the kinematic properties of other road users, including their speed, acceleration, and possible maneuvers. Challenges include an argument for completeness, a specification for the machinery for checking against the standard, and the connection to a liability governance framework. 2) AI-Driver While IEEE 2846 comes from a bottom-up technology perspective, Koopman/Widen [19] have proposed the concept of defining an AI driver which must replicate all the competencies of a human driver in a complex, real-world environment. Key points of Koopman’s AI driver concept include:

a) Full Driving Capability: The AI driver must handle the entire driving task, including perception (sensing the environment), decision-making (planning and responding to scenarios), and control (executing physical movements like steering and braking). It must also account for nuances like social driving norms and unexpected events. b) Safety Assurance: Koopman stresses that AVs need rigorous safety standards, similar to those in industries like aviation. This includes identifying potential failures, managing risks, and ensuring safe operation even in the face of unforeseen events. c) Human Equivalence: The AI driver must meet or exceed the performance of a competent, human driver. This involves adhering to traffic laws, responding to edge cases (rare or unusual driving scenarios), and maintaining situational awareness at all times. d) Ethical and Legal Responsibility: An AI driver must operate within ethical and legal frameworks, including handling situations that involve moral decisions or liability concerns. e) Testing and Validation: Koopman emphasizes the importance of robust testing, simulation, and on-road trials to validate AI driver systems. This includes covering edge cases, long-tail risks, and ensuring that systems generalize across diverse driving conditions. Overall, it is a very ambitious endeavor and there are significant challenges to building this specification of a reasonable driver. First, the idea of a “reasonable” driver is not even well encoded on the human side. Rather, this definition of “reasonableness” is built over a long history of legal distillation, and of course, the human standard is built on the understanding of humans by other humans. Second, the complexity of such a standard would be very high and it is not clear if it is doable. Finally, it may take quite a while of legal distillation to reach some level of closure on a human like an “AI-Driver.” Currently, the state-of-art for specification is relatively poor for both ADAS and AV. ADAS systems, which are widely proliferated, have massive divergences in behavior and completeness. When a customer buys ADAS, it is not entirely clear what they are getting. Tests by industry groups such as AAA, consumer reports, and IIHS have shown the significant shortcomings of existing solutions [20]. In 2024, IIHS introduced a ratings program to evaluate the safeguards of partial driving automation systems. Out of 14 systems tested, only one received an acceptable rating, highlighting the need for improved measures to prevent misuse and ensure driver engagement [21]. Today, there is only one non process oriented regulation in the marketplace, and this is the NHTSA regulations around AEB [22].

Summary:

This chapter traces the evolution of software from programmable hardware foundations to a dominant force in modern computing systems. Early advances in hardware programmability—through configuration, programmable logic (e.g., FPGAs), and stored-program processors—enabled a separation between physical implementation and functional behavior. The introduction of stable computer architectures (notably IBM System/360) and operating systems created enduring abstractions that allowed software portability, scalability, and rapid innovation. Over time, networking and open-source ecosystems further accelerated the growth of information technology, establishing software as the central driver of capability across computing platforms.

As software methods entered cyber-physical systems (CPS)—including ground, airborne, marine, and space domains—they followed a distinct trajectory shaped by real-time constraints, safety requirements, and physical interaction. Initially introduced to enhance control and diagnostics, software evolved into the core coordinating layer for sensing, decision-making, and actuation, enabling autonomy. This transition was supported by the emergence of real-time operating systems (RTOSes), middleware, and layered software architectures that ensured deterministic behavior and modularity. Across all domains, systems evolved from isolated, hardware-centric designs to distributed, software-intensive platforms, with increasing reliance on standardized frameworks and communication protocols.

The chapter further highlights how software has transformed product development, supply chains, and validation practices. Cyber-physical systems are increasingly influenced by the faster-moving IT ecosystem, adopting open-source components, layered stacks, and continuous update models (e.g., software-defined vehicles). At the same time, safety standards (e.g., ISO 26262, DO-178C) and rigorous verification methods—such as hardware/software co-simulation (MIL, SIL, HIL)—have evolved to address the risks of software-driven behavior. Modern software supply chains are complex, incorporating third-party and open-source dependencies, requiring strong configuration management, traceability, and cybersecurity practices. Overall, the chapter emphasizes a fundamental shift: engineered systems are no longer hardware products with embedded software, but increasingly software platforms embodied in hardware.

Assessment:

#	Assessment Title	Description (Project / Report)	Learning Objectives
1	Evolution of Programmable Systems	Write a report tracing the evolution from fixed-function hardware to programmable systems (configuration, FPGA, microprocessors) and the abstraction of software as an abstraction. Include historical milestones and examples.	Understand the transition from hardware-centric to software-defined systems. Explain key programming paradigms (configuration, assembly, high-level programming). Analyze the role of abstraction architecture (e.g., system stack).
2	Cyber-Physical Software Stack Analysis	Develop a structured report analyzing a real-world CPS (e.g., automotive ADAS, UAV, or spacecraft). Map its software stack (HAL, RTOS, middleware, applications) and explain how each layer contributes to overall system functionality.	Identify layers in CPS software architectures. Explain the role of RTOS, middleware, and HAL. Analyze real-time and safety constraints in system design.
3	IT vs CPS Supply Chain Comparison Study	Produce a comparative analysis of hardware and software supply chains in IT vs CPS, with focus on lifecycle management, dependencies, and update strategies. Include risks and trade-offs.	Compare IT and CPS development ecosystems. Evaluate the impact of “innovation cycles” in CPS (cost, obsolescence, certification). Assess risks (safety, cybersecurity) and benefits (flexibility, innovation).
4	Safety Verification and Validation Framework	Write a report comparing software validation approaches in IT and CPS, focusing on simulation/emulation (MIL, SIL, HIL) and safety standards (e.g., ISO 26262, DO-178C). Include a case study.	Understand verification vs validation in different domains. Explain simulation/emulation methods in CPS. Analyze how safety standards shape software development.
5	Software-Defined System Proposal	Develop a conceptual design for a “software-defined” system (e.g., vehicle, drone, or marine system). Describe architecture, update model (OTA), software stack, and lifecycle management approach.	Apply concepts of software-defined systems. Design layered, modular architectures. Integrate lifecycle, update, and maintainability considerations.

Stack Framework	Type	Core Covered Layers	Key Technologies	Domain Focus	Notes / Differentiation
ROS 2	Open-source middleware stack	Middleware, application	DDS, nodes, topics, Gazebo, RViz	Robotics, AV	De facto R&D standard; highly modular
AUTOSAR Adaptive	Automotive software platform	OS, middleware, apps	POSIX OS, SOME/IP, service-oriented	Automotive (ADAS/AV)	Designed for ISO 26262 + OTA updates
AUTOSAR Classic Platform	Embedded real-time stack	HAL, RTOS, basic software	OSEK or RTOS, CAN, ECU abstraction	Automotive ECUs	Deterministic, safety-certified
Apollo	Full autonomy stack	Full stack (perception → control)	Cyber RT, AI models, HD maps	Autonomous driving (L2–L4)	One of the most complete open AV stacks
Autoware	Open AV stack	Full autonomy pipeline	ROS 2, perception, planning modules	Automotive, robotics	Strong academic + industry ecosystem
NVIDIA DRIVE OS	Integrated platform	OS, middleware, AI runtime	CUDA, TensorRT, DriveWorks	Automotive autonomy	Tight HW/SW co-design with GPUs
QNX Neutrino	RTOS middleware	OS, safety layer	POSIX RTOS, microkernel	Automotive, industrial	Strong certification (ASIL-D)
VxWorks	RTOS	OS, middleware	Deterministic RTOS, ARINC653	Aerospace, defense	Widely used in safety-critical systems
PX4 Autopilot	UAV autonomy stack	Control, middleware, perception	MAVLink, EKF, control loops	UAV / drones	Industry standard for drones
ArduPilot	UAV autonomy stack	Control + navigation	Mission planning, sensor fusion	UAV, marine robotics	Broad vehicle support (air/land/sea)
MOOS-IvP	Marine autonomy stack	Middleware	Behavior-based robotics	Marine robotics	Optimized for low bandwidth environments
DDS (Data Distribution Service)	Middleware standard	Communication layer	QoS messaging, pub-sub	Cross-domain CPS	Backbone of ROS 2 and many systems
AWS RoboMaker	Cloud robotics stack	Cloud, simulation	DevOps, ROS integration	Robotics, AV	Enables CI/CD + simulation workflows
Microsoft AirSim	Simulation stack	Simulation layer	Unreal Engine, physics models	UAV, AV	High-fidelity perception simulation
CARLA	Simulation stack	Simulation layer	OpenDRIVE, sensors, physics	Automotive	Widely used for AV validation
Gazebo	Simulation stack	Simulation integration	Physics engine, ROS integration	Robotics	Standard for ROS-based systems

en/safeav/softsys/vaicomp.1775469885.txt.gz · Last modified: 2026/04/06 13:04 by airi