====== V&V of Software Systems and Middleware ====== ===== V&V objectives and evidence chain ===== Verification asks whether the software artefact was built correctly against its specification. Validation asks whether the right behaviour has been specified and achieved in the intended operational context. In autonomous systems both questions must be answered at multiple levels. Unit tests may show that a planner function satisfies a local requirement, but they do not validate vehicle behaviour in mixed traffic. Scenario simulation may show acceptable behaviour in many cases, but it does not prove that the deployed binary, middleware configuration and sensor timing match the tested configuration. A useful evidence chain begins with system hazards and operational assumptions. These are translated into software safety requirements, timing budgets, interface contracts, data-quality requirements and degraded-mode expectations. Architecture reviews and design analyses check whether the software structure can support the requirements. Implementation verification examines source code, generated code, models and configuration files. Integration verification checks interfaces, scheduling, message semantics and fault propagation. System validation exercises scenarios and missions. Release assurance confirms that the tested artefacts are the artefacts deployed. Operational monitoring checks that assumptions remain valid after deployment. ===== Requirements and architecture V&V ===== Requirements verification starts before code exists. Requirements must be unambiguous, testable, traceable and allocated to architectural elements. For middleware, requirements should include message rates, maximum latency, data freshness, synchronization tolerance, queueing policy, reliability, persistence, security, startup order and fault behaviour. For autonomy applications, requirements should describe the operational design domain, assumptions about sensors and maps, acceptable degraded modes, fallback behaviour and human-supervision responsibilities. Architectural verification uses reviews, interface analysis, failure-mode analysis, threat modeling, timing analysis and safety analysis to determine whether the structure can satisfy the requirements. It should examine partitioning between safety-critical and non-safety functions, freedom from interference, redundancy management, data-flow consistency, resource budgets and the consequences of node or network failure. The main limitation is that early verification depends on the quality of assumptions; therefore, requirements and architecture reviews must be connected to scenario analysis, hazard analysis and simulation evidence. ===== Implementation, integration and timing V&V ===== Implementation-level verification examines source code, generated code, models and configuration files. Common methods include peer review, static analysis, coding-standard compliance, unit testing, structural coverage, model checking where feasible, and toolchain qualification when tools can introduce or fail to detect errors. Unit verification is necessary but not sufficient because autonomous behaviour arises from interactions among many components. Integration V&V tests whether independently verified components work together correctly. For middleware, this includes message compatibility, serialization, topic naming, service discovery, data-rate handling, startup and shutdown order, degraded communication, resource exhaustion, failover and security boundaries. Timing verification must measure and analyse end-to-end latency, jitter, deadline misses, CPU and memory margins, queue occupancy and network loading under nominal and stress conditions. The main risk is that integration tests are often performed in clean laboratory conditions that do not represent peak load, degraded sensors or network faults. ===== Simulation, HIL and scenario validation ===== Autonomous-system validation depends on progressive movement from models to real systems. Model-in-the-loop tests verify algorithms against mathematical models. Software-in-the-loop tests execute production or near-production software in simulated environments. Processor-in-the-loop tests add target instruction-set or processor effects. Hardware-in-the-loop tests run real controllers or compute platforms against simulated sensors, actuators and plant dynamics. Field trials and operational pilots then validate behaviour in the physical environment. This progression reduces risk but does not remove uncertainty. Simulation fidelity is limited by environment models, sensor models, traffic or mission models and assumptions about rare events. HIL may represent timing accurately but not the full physical world. Field testing is realistic but cannot cover all combinations of weather, traffic, faults, human behaviour and cyber conditions. Scenario validation should therefore be risk-based and traceable to hazards, operational design domain boundaries and known limitations. {{:en:safeav:softsys:chatgpt_image_jun_17_2026_03_11_53_pm.png?400|}} ===== Configuration, release and operational V&V ===== A software release is not only code. It is a configuration baseline containing requirements, source, generated artefacts, build tools, libraries, containers, middleware settings, calibration, AI models, datasets, test reports and release approvals. Configuration V&V confirms that the baseline is complete, internally consistent, reproducible and matched to the target hardware and operational context. Release audits should verify SBOM completeness, vulnerability status, change-request closure, test-result integrity, tool versions, signed artefacts and rollback readiness. Operational V&V extends assurance after deployment. Monitoring should detect deadline misses, degraded sensors, software restarts, communication failures, unusual scenario distributions, safety-monitor activations, update failures and cyber indicators. However, operational data can be biased toward conditions already encountered by the fleet, and privacy or connectivity limits may restrict what can be collected. The safety case should state which assumptions are monitored and what action is taken when monitoring shows that assumptions are no longer valid. ===== Standards, assurance cases, limitations and risks ===== Standards such as IEC 61508, ISO 26262, DO-178C, ISO/IEC/IEEE 12207 and ISO/IEC/IEEE 828 define important expectations for lifecycle rigor, verification independence, traceability, configuration control and evidence. They should be used as scaffolding for a safety case rather than as a substitute for one. The assurance case links claims about safe software behaviour to evidence and assumptions. For middleware, claims may concern deterministic communication, freedom from interference, secure data exchange and recoverability. For AI components, claims may concern performance within the operational design domain, containment by monitors and safe fallback when confidence is insufficient. Software V&V has inherent limitations. Exhaustive testing is infeasible for complex distributed autonomy. Timing measurements are workload-dependent. Simulation is limited by model fidelity. Formal methods are limited by assumptions and scalability. Field testing is expensive and cannot cover all rare events. Standards reduce process risk but do not guarantee that requirements are complete or that operational assumptions remain valid. Residual risk must be managed through architectural containment, redundancy, runtime monitoring, degraded modes, secure updates and operational review. ===== Metrics, reviews and acceptance criteria ===== Software V&V needs measurable acceptance criteria, but metrics must be interpreted in relation to the safety argument. Defect counts, code coverage, test pass rates and static-analysis warnings are useful management indicators; they are not direct measures of safety. A release with high statement coverage may still lack tests for hazardous scenarios, while a release with many low-severity warnings may be safer than one with fewer but unresolved timing or configuration risks. For middleware and runtime platforms, useful metrics include end-to-end latency distributions, deadline-miss rates, jitter, queue occupancy, dropped-message rates, clock-synchronisation error, restart time, memory growth, CPU margin and network utilization under stress. For autonomy applications, useful metrics include scenario pass rates, performance by operational-design-domain class, safety-monitor activations, fallback success, uncertainty calibration and regression against known hazardous cases. For configuration and supply chain, useful metrics include build reproducibility, known vulnerabilities, unresolved change requests, audit findings and traceability completeness. Acceptance criteria should be risk-based. A non-safety display defect may be accepted with a workaround, while an intermittent stale-data condition in a control path should block release until understood and mitigated. The release decision should record residual risks, known limitations, monitoring requirements and the rationale for acceptance. This evidence closes the loop between V&V results and accountable safety governance. ===== Practical V&V planning checklist ===== A practical V&V plan begins by listing software functions that can contribute to hazards. For each function, the plan identifies the responsible architectural element, relevant requirements, operating conditions, assumptions about timing and data, and evidence required before release. This creates a bridge between system safety analysis and day-to-day software engineering. The plan should define integration milestones. Early milestones can focus on models, unit tests and interface contracts. Middle milestones should add middleware stress tests, timing measurement, hardware-software integration and fault injection. Later milestones should add HIL, scenario validation, field trials, release audit and operational monitoring. Each milestone should have entry and exit criteria, including which configuration baseline is being tested and what evidence will be archived. Finally, the plan should define how evidence changes after deployment. Connected autonomous systems can receive new code, calibration, AI models, maps and configurations. Each change should trigger impact analysis: which requirements, hazards, interfaces, tests and approvals are affected? Some changes may be handled by automated regression and staged rollout; others require renewed safety assessment or certification engagement.