ETD-HUB

Catalink Case Study: IRIS

Created: 3 months, 1 week ago. by: Catalink

Categories: AI Models Machine Vision Transport

Short Description

Driver State Monitoring (DSM) technologies play a vital role in improving road safety, especially as fully autonomous vehicles remain years away. These systems monitor drivers for signs of impairment, such as drowsiness, to help prevent accidents caused by human error. However, a major limitation of current DSM systems lies in facial landmark recognition models, which often exhibit biases that reduce accuracy for individuals with diverse racial and facial features, leading to unreliable detection and potentially dangerous false negatives.

To address this, we aim to develop a suite of advanced drowsiness detection models using ALFIE’s AutoML platform. Leveraging Vision Transformer architectures and integrating multimodal inputs, such as physiological signals like heart rate or steering behavior, our models will be better equipped to deliver accurate, unbiased detection. These models will undergo rigorous testing across a broad range of scenarios and demographics to ensure improved performance over existing systems like MDNet (Multimodal Drowsiness Network) and standard ViT (Vision Transformers) models.

Effective model training requires access to diverse, high-quality data. While existing public datasets (e.g., UTA-RLDD, DAD, (DriverMVT, & manD1.0) offer valuable resources, many lack sufficient diversity. To overcome this, we plan to generate synthetic data and, if needed, collect our own diverse dataset to ensure fairness and inclusivity. By tackling these challenges head-on, this project aims to significantly enhance the reliability and equity of DSM systems, ultimately contributing to safer roads for everyone.

Full Description

Driver State Monitoring (DSM) technologies are crucial for enhancing road safety, especially given that fully autonomous driving is still a long way off. These systems are designed to monitor a driver's condition and intervene to mitigate risks associated with human error.

However, current DSM systems face a significant challenge: limitations in their facial landmark recognition models. These models, essential for drowsiness detection, often exhibit biases that reduce their accuracy when applied to individuals from diverse racial backgrounds or with varying facial characteristics. This can lead to a decreased ability to correctly identify key facial features or expressions across different demographics.

This pervasive issue can have serious consequences for drowsiness detection. If a DSM system fails to accurately detect drowsiness due to model biases, it can produce "false negatives" instances where drowsiness is present but not identified. These false negatives undermine the system's reliability and compromise its ability to prevent accidents.

The primary goal of this use case is to address these critical shortcomings by creating, with the help of ALFIE’s AutoML platform, diverse model variations specifically for drowsiness detection. Our company aims to develop state-of-the-art visual models based on the Vision Transformer Architecture, which has shown excellent results in image-based classification tasks. Furthermore, we will integrate multimodal models that combine two or more modalities. By incorporating physiological signals like heart rate or steering wheel patterns, we aim to provide more useful information to our models for more accurate and unbiased drowsiness classification.

These varied models will undergo rigorous testing across a comprehensive range of scenarios. The ultimate goal of this extensive validation process is to demonstrate a tangible increase in performance compared to current state-of-the-art drowsiness detection models (MDNet - Multimodal Drowsiness Network, ViT - Vision Transformers). This improved performance will be measured by enhanced accuracy in identifying drowsiness across a broad spectrum of individuals, specifically including those from different racial backgrounds and with diverse facial characteristics.

To effectively train our models, a comprehensive set of datasets is required. Our preliminary research has identified several public datasets (UTA-RLDD - Real-Life Drowsiness Dataset, DAD - Driver Anomaly Detection) primarily focused on image data for driver distraction detection. Additionally, two multimodal datasets (DriverMVT, manD1.0) incorporating physiological data have been identified as potentially beneficial for training multimodal models. A significant challenge in selecting the most suitable dataset lies in ensuring sufficient diversity regarding the racial backgrounds and, for image-based datasets, facial characteristics of the subjects. Recognizing that these limitations may hinder the acquisition of adequate data for unbiased and fair model training, our subsequent objective is to generate synthetic data for this purpose. Should the synthetic data prove insufficient, we intend to collect our own data and train our models on a newly created dataset that is sufficiently unbiased and diverse for our specific application.

In conclusion, while Driver State Monitoring technologies are vital for road safety, current systems are hampered by biases in facial landmark recognition models, leading to compromised accuracy and potentially dangerous "false negatives" in drowsiness detection, particularly for diverse populations. By leveraging the ALFIE's AutoML platform, this use case aims to overcome these critical shortcomings,by developing and rigorously testing a diverse array of advanced, multimodal drowsiness detection models to achieve significantly improved, unbiased, and inclusive performance in detecting drowsiness across all individuals, thereby enhancing road safety and upholding principles of fairness and effectiveness.