====== V&V of Software Systems and Middleware ======

===== V&V objectives and evidence chain =====

Verification asks whether the software artefact was built correctly against its specification. Validation asks whether the right behaviour has been specified and achieved in the intended operational context. In autonomous systems both questions must be answered at multiple levels. Unit tests may show that a planner function satisfies a local requirement, but they do not validate vehicle behaviour in mixed traffic. Scenario simulation may show acceptable behaviour in many cases, but it does not prove that the deployed binary, middleware configuration and sensor timing match the tested configuration.

A useful evidence chain begins with system hazards and operational assumptions. These are translated into software safety requirements, timing budgets, interface contracts, data-quality requirements and degraded-mode expectations. Architecture reviews and design analyses check whether the software structure can support the requirements. Implementation verification examines source code, generated code, models and configuration files. Integration verification checks interfaces, scheduling, message semantics and fault propagation. System validation exercises scenarios and missions. Release assurance confirms that the tested artefacts are the artefacts deployed. Operational monitoring checks that assumptions remain valid after deployment.

===== Requirements and architecture V&V =====

Requirements verification starts before code exists. Requirements must be unambiguous, testable, traceable and allocated to architectural elements. For middleware, requirements should include message rates, maximum latency, data freshness, synchronization tolerance, queueing policy, reliability, persistence, security, startup order and fault behaviour. For autonomy applications, requirements should describe the operational design domain, assumptions about sensors and maps, acceptable degraded modes, fallback behaviour and human-supervision responsibilities.

Architectural verification uses reviews, interface analysis, failure-mode analysis, threat modeling, timing analysis and safety analysis to determine whether the structure can satisfy the requirements. It should examine partitioning between safety-critical and non-safety functions, freedom from interference, redundancy management, data-flow consistency, resource budgets and the consequences of node or network failure. The main limitation is that early verification depends on the quality of assumptions; therefore, requirements and architecture reviews must be connected to scenario analysis, hazard analysis and simulation evidence.

===== Implementation, integration and timing V&V =====

Implementation-level verification examines source code, generated code, models and configuration files. Common methods include peer review, static analysis, coding-standard compliance, unit testing, structural coverage, model checking where feasible, and toolchain qualification when tools can introduce or fail to detect errors. Unit verification is necessary but not sufficient because autonomous behaviour arises from interactions among many components.

Integration V&V tests whether independently verified components work together correctly. For middleware, this includes message compatibility, serialization, topic naming, service discovery, data-rate handling, startup and shutdown order, degraded communication, resource exhaustion, failover and security boundaries. Timing verification must measure and analyse end-to-end latency, jitter, deadline misses, CPU and memory margins, queue occupancy and network loading under nominal and stress conditions. The main risk is that integration tests are often performed in clean laboratory conditions that do not represent peak load, degraded sensors or network faults.

===== Simulation, HIL and scenario validation =====

Autonomous-system validation depends on progressive movement from models to real systems. Model-in-the-loop tests verify algorithms against mathematical models. Software-in-the-loop tests execute production or near-production software in simulated environments. Processor-in-the-loop tests add target instruction-set or processor effects. Hardware-in-the-loop tests run real controllers or compute platforms against simulated sensors, actuators and plant dynamics. Field trials and operational pilots then validate behaviour in the physical environment.

This progression reduces risk but does not remove uncertainty. Simulation fidelity is limited by environment models, sensor models, traffic or mission models and assumptions about rare events. HIL may represent timing accurately but not the full physical world. Field testing is realistic but cannot cover all combinations of weather, traffic, faults, human behaviour and cyber conditions. Scenario validation should therefore be risk-based and traceable to hazards, operational design domain boundaries and known limitations.

{{:en:safeav:softsys:chatgpt_image_jun_17_2026_03_11_53_pm.png?400|}}

===== Configuration, release and operational V&V =====

A software release is not only code. It is a configuration baseline containing requirements, source, generated artefacts, build tools, libraries, containers, middleware settings, calibration, AI models, datasets, test reports and release approvals. Configuration V&V confirms that the baseline is complete, internally consistent, reproducible and matched to the target hardware and operational context. Release audits should verify SBOM completeness, vulnerability status, change-request closure, test-result integrity, tool versions, signed artefacts and rollback readiness.

Operational V&V extends assurance after deployment. Monitoring should detect deadline misses, degraded sensors, software restarts, communication failures, unusual scenario distributions, safety-monitor activations, update failures and cyber indicators. However, operational data can be biased toward conditions already encountered by the fleet, and privacy or connectivity limits may restrict what can be collected. The safety case should state which assumptions are monitored and what action is taken when monitoring shows that assumptions are no longer valid.

===== Standards, assurance cases, limitations and risks =====

Standards such as IEC 61508, ISO 26262, DO-178C, ISO/IEC/IEEE 12207 and ISO/IEC/IEEE 828 define important expectations for lifecycle rigor, verification independence, traceability, configuration control and evidence. They should be used as scaffolding for a safety case rather than as a substitute for one. The assurance case links claims about safe software behaviour to evidence and assumptions. For middleware, claims may concern deterministic communication, freedom from interference, secure data exchange and recoverability. For AI components, claims may concern performance within the operational design domain, containment by monitors and safe fallback when confidence is insufficient.

Software V&V has inherent limitations. Exhaustive testing is infeasible for complex distributed autonomy. Timing measurements are workload-dependent. Simulation is limited by model fidelity. Formal methods are limited by assumptions and scalability. Field testing is expensive and cannot cover all rare events. Standards reduce process risk but do not guarantee that requirements are complete or that operational assumptions remain valid. Residual risk must be managed through architectural containment, redundancy, runtime monitoring, degraded modes, secure updates and operational review.

===== Metrics, reviews and acceptance criteria =====

Software V&V needs measurable acceptance criteria, but metrics must be interpreted in relation to the safety argument. Defect counts, code coverage, test pass rates and static-analysis warnings are useful management indicators; they are not direct measures of safety. A release with high statement coverage may still lack tests for hazardous scenarios, while a release with many low-severity warnings may be safer than one with fewer but unresolved timing or configuration risks.

For middleware and runtime platforms, useful metrics include end-to-end latency distributions, deadline-miss rates, jitter, queue occupancy, dropped-message rates, clock-synchronisation error, restart time, memory growth, CPU margin and network utilization under stress. For autonomy applications, useful metrics include scenario pass rates, performance by operational-design-domain class, safety-monitor activations, fallback success, uncertainty calibration and regression against known hazardous cases. For configuration and supply chain, useful metrics include build reproducibility, known vulnerabilities, unresolved change requests, audit findings and traceability completeness.

Acceptance criteria should be risk-based. A non-safety display defect may be accepted with a workaround, while an intermittent stale-data condition in a control path should block release until understood and mitigated. The release decision should record residual risks, known limitations, monitoring requirements and the rationale for acceptance. This evidence closes the loop between V&V results and accountable safety governance.

===== Practical V&V planning checklist =====

A practical V&V plan begins by listing software functions that can contribute to hazards. For each function, the plan identifies the responsible architectural element, relevant requirements, operating conditions, assumptions about timing and data, and evidence required before release. This creates a bridge between system safety analysis and day-to-day software engineering.

The plan should define integration milestones. Early milestones can focus on models, unit tests and interface contracts. Middle milestones should add middleware stress tests, timing measurement, hardware-software integration and fault injection. Later milestones should add HIL, scenario validation, field trials, release audit and operational monitoring. Each milestone should have entry and exit criteria, including which configuration baseline is being tested and what evidence will be archived.

Finally, the plan should define how evidence changes after deployment. Connected autonomous systems can receive new code, calibration, AI models, maps and configurations. Each change should trigger impact analysis: which requirements, hazards, interfaces, tests and approvals are affected? Some changes may be handled by automated regression and staged rollout; others require renewed safety assessment or certification engagement.

<table Ref.Tab.5.5>
<caption>Practical V&V planning checklist for software and middleware</caption>

^ Planning question ^	Evidence to request ^
| What hazards can this software influence?	| Hazard-to-requirement traceability and safety mechanism allocation. |
| What assumptions does the middleware make? |	Latency budgets, freshness limits, QoS settings, clock synchronisation requirements. |
| How is the release reconstructed? |	Configuration baseline, SBOM, signed artefacts, toolchain and build records. |
| How are AI components controlled? |	Dataset and model versioning, scenario results, robustness tests, runtime monitor evidence. |
| What happens after deployment? |	Monitoring plan, incident review process, update impact analysis and rollback procedure. |


</table>

===== Method limitations and residual-risk control =====

No single V&V method is sufficient for software systems and middleware. Reviews are effective for finding requirement ambiguity and architectural gaps, but they depend on reviewer expertise. Static analysis can find many implementation defects, but it does not validate operational behaviour. Unit tests provide local confidence, but they do not expose distributed timing faults. Simulation explores many scenarios safely, but its results are only as credible as its models. Field trials reveal real-world behaviour, but they cannot cover all rare combinations of faults, weather, users and environments.

For this reason, residual risk must be managed by combining evidence and by designing systems that remain safe when evidence is incomplete. Runtime monitors can detect stale data, confidence loss, missed deadlines and inconsistent sensor streams. Degraded modes can reduce speed, request human takeover, hold position, return to base or enter a safe stop. Redundant channels can cross-check critical functions. Secure update and rollback mechanisms can correct faults without introducing uncontrolled changes.

The safety case should make these limits explicit. It should state what has been verified, what has been validated, what assumptions remain, which assumptions are monitored during operation and what authority is responsible for responding when evidence changes. This closes the loop between pre-release V&V and operational safety management.