Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:safeav:maps:ai [2025/10/21 16:15] – [Deep Learning Architectures] kosnarken:safeav:maps:ai [2026/04/24 09:43] (current) raivo.sell
Line 1: Line 1:
 ====== AI-based Perception and Scene Understanding ====== ====== AI-based Perception and Scene Understanding ======
-{{:en:iot-open:czapka_b.png?50| Bachelors (1st level) classification icon }} 
- 
-<todo @bertlluk #bertlluk:2025-06-25></todo> 
  
 Advances in AI, especially the convolutional neural network, allow us to process raw sensory information and recognize objects and categorize them into classes with higher levels of abstraction (pedestrians, cars, trees, etc.). Taking these categories into account allows autonomous vehicles to understand the scene and reason about future actions of the vehicle as well as about the other participants in road traffic and make assumptions on/predictions of their possible interactions. This section elaborates on the comparison of commonly used methods, their advantages, and weaknesses. Advances in AI, especially the convolutional neural network, allow us to process raw sensory information and recognize objects and categorize them into classes with higher levels of abstraction (pedestrians, cars, trees, etc.). Taking these categories into account allows autonomous vehicles to understand the scene and reason about future actions of the vehicle as well as about the other participants in road traffic and make assumptions on/predictions of their possible interactions. This section elaborates on the comparison of commonly used methods, their advantages, and weaknesses.
Line 39: Line 36:
 Alternatively, point-based networks like ''PointNet'' and ''PointNet++'' operate directly on raw point sets without voxelization, preserving fine geometric detail. Alternatively, point-based networks like ''PointNet'' and ''PointNet++'' operate directly on raw point sets without voxelization, preserving fine geometric detail.
 These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions.
 +
 +{{ :en:safeav:maps:cnn.webp?400 |}}
  
 === Transformer Architectures === === Transformer Architectures ===
Line 46: Line 45:
 Notable examples include ''DETR'' (Detection Transformer), ''BEVFormer'', and ''TransFusion'', which unify information from cameras and LiDARs into a consistent spatial representation. Notable examples include ''DETR'' (Detection Transformer), ''BEVFormer'', and ''TransFusion'', which unify information from cameras and LiDARs into a consistent spatial representation.
  
-{{ :en:safeav:maps:cnn.webp |}}+
  
 === Recurrent and Temporal Models === === Recurrent and Temporal Models ===
Line 54: Line 53:
 More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability.
  
-{{:en:safeav:maps:lstm.png?400|}}+{{ :en:safeav:maps:lstm.png?400 |}}
  
 === Graph Neural Networks (GNNs) === === Graph Neural Networks (GNNs) ===
Line 72: Line 71:
 Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter.  Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter. 
 Datasets must include variations in: Datasets must include variations in:
-* **Sensor modalities** – data from cameras, LiDAR, radar, GNSS, and IMU, reflecting the multimodal nature of perception. + 
-* **Environmental conditions** – daytime and nighttime scenes, different seasons, weather effects such as rain, fog, or snow. +  * **Sensor modalities** – data from cameras, LiDAR, radar, GNSS, and IMU, reflecting the multimodal nature of perception. 
-* **Geographical and cultural contexts** – urban, suburban, and rural areas; diverse traffic rules and road signage conventions. +  * **Environmental conditions** – daytime and nighttime scenes, different seasons, weather effects such as rain, fog, or snow. 
-* **Behavioral diversity** – normal driving, aggressive maneuvers, and rare events such as jaywalking or emergency stops. +  * **Geographical and cultural contexts** – urban, suburban, and rural areas; diverse traffic rules and road signage conventions. 
-* **Edge cases** – rare but safety-critical situations, including near-collisions or sensor occlusions.+  * **Behavioral diversity** – normal driving, aggressive maneuvers, and rare events such as jaywalking or emergency stops. 
 +  * **Edge cases** – rare but safety-critical situations, including near-collisions or sensor occlusions.
  
 A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution. A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution.
en/safeav/maps/ai.1761052552.txt.gz · Last modified: by kosnark
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0