ETD-HUB

1: What are Acceptable Algorithm Performance Standards?

Asked: 4 months, 4 weeks ago By: Catalink Views: 85 Catalink Case Study: IRIS

Considering the high-risk implications of False Positives (i.e. detecting fatigue when none exists) and False Negatives (i.e. failing to detect actual fatigue), what regulatory or industry-mandated level of performance, accuracy, and reliability is required for commercial deployment?

21 Answers

Answered: 3 months, 1 week ago By: Chiamakaokorie
-
Answered: 3 months, 1 week ago By: Tundefasina
DSM systems like IRIS should meet very high safety thresholds, typically >95% overall accuracy, extremely low false-negative rates, and tightly controlled false positives. Compliance with ISO 26262 (functional safety) and ISO 21448 (SOTIF) should be mandatory, along with proven reliability across diverse demographics and real-world conditions.
Charlie replied: For sure, also, for an EU vehicle deployment, the main hard requirements would come from the General Safety Regulation framework and the specific DDAW rules. EU Regulation 2019/2144 requires Driver Drowsiness and Attention Warning systems for M and N category vehicles, with the obligation applying to new vehicle types from 6 July 2022 and to all new vehicles from 7 July 2024. DDAW systems are defined as systems that assess the driver’s alertness through vehicle-system analysis and warn the driver if needed. The detailed DDAW regulation does not prescribe one model architecture or one fixed accuracy percentage. Instead, it sets functional and validation requirements. The system must operate under defined conditions, including automatic activation above 70 km/h and operation in daytime and night-time conditions. It must warn the driver at a drowsiness level equivalent to or above level 8 on the Karolinska Sleepiness Scale, although it may warn from level 7 onwards. The validation regime is also important. Manufacturers must compile a dossier explaining the DDAW system, its operation, the test procedures used, the rationale for those procedures, and the full validation results from human-participant testing. The technical service then assesses whether the design, operation, and performance evidence adequately demonstrate compliance, and it may run a manufacturer-defined verification test. So, legally, the answer is: IRIS must meet the DDAW type-approval performance requirements, but those requirements are not expressed as a simple public accuracy percentage. The regulation requires validation against drowsiness states, using the Karolinska Sleepiness Scale or an equivalent method, with human participants and a statistical approach to minimum performance thresholds. At least 10 human participants must be included in validation testing, and each participant must generate at least one true positive or false negative event.
Answered: 3 months, 1 week ago By: Zainabodogwu2
For commercial deployment in safety-critical contexts, fatigue detection systems should meet ≥95% sensitivity and specificity, ≤5% false-negative and false-positive rates, validated across diverse real-world conditions, with continuous post-market performance monitoring.
Deleuze replied: Definitely the case. It is also correct, as some other commenters have mentioned, that standards such as ISO 26262 and ISO 21448/SOTIF are also highly relevant. ISO 26262 applies to safety-related electrical and electronic systems in production vehicles, while ISO 21448 is especially relevant where safety depends on complex sensors and processing algorithms. These standards require a safety case showing that risks have been identified, reduced, verified, and validated, but they also do not impose a single drowsiness-detection accuracy percentage. Euro NCAP is another important commercial benchmark. It is not a legal approval regime, but it strongly influences market expectations. From 2026, Euro NCAP evaluates driver monitoring technologies that maintain attention and engagement, with points awarded for systems that monitor driver behaviour in real time and link driver-state information to assistance-system behaviour.
Answered: 3 months, 1 week ago By: Oliverharrow
High level of performance with accuracy and reliability as high as the defense line of my football club
Answered: 3 months, 1 week ago By: Ngozioshoba
Because IRIS is a safety system used while driving, it must achieve very high accuracy and reliability before commercial release. It should detect drowsiness correctly in most situations while avoiding unnecessary false alerts. Just as important, it must work consistently for different drivers and environments. Strong real-world testing is essential because even small errors could affect safety.
Answered: 3 months, 1 week ago By: Efeadelaja
Commercial fatigue-detection systems should meet safety-critical standards with ~95%+ accuracy, false negatives below 1–2%, controlled false positives, and compliance with ISO functional safety and real-world validation requirements.
Answered: 3 months, 1 week ago By: Meilincai
Better system mobility on all races and skin complexions
Answered: 3 months, 1 week ago By: Kelechinwosu
The Bottom Line: Success in any project relies on clear communication and consistent action rather than waiting for the "perfect" moment. By breaking down complex goals into manageable pillars—such as prioritizing high-impact tasks, maintaining a feedback loop with your team, and focusing on incremental progress—you create a sustainable workflow that prevents burnout. Ultimately, the goal is to balance efficiency with quality; as long as the core objective remains the "North Star," small daily adjustments will naturally lead to the desired outcome.
Charlie replied: Good for projects but not useful advice for specific standards for an AI algo imo
Answered: 3 months, 1 week ago By: Beatricelorne
Good resulting tests on sample groups of people from all backgrounds, that have a negligible rate of false positives/negatives.
Answered: 3 months, 1 week ago By: Zainabodogwu32
Given the safety-critical nature of Driver State Monitoring (DSM) systems like IRIS, regulatory and industry expectations should be set significantly higher than typical consumer AI applications. Both false positives and false negatives carry risks: false positives may lead to unnecessary driver distraction or system disengagement, while false negatives may directly contribute to road accidents. Although the EU AI Act does not prescribe explicit numerical thresholds for accuracy, it requires high-risk AI systems to achieve a level of performance that is appropriate to their intended purpose and foreseeable risks. For IRIS, this implies: High sensitivity (recall) for detecting genuine drowsiness, to minimise false negatives. Acceptable specificity, to avoid excessive false alerts that may cause alert fatigue. Robust performance across environments, including low light, occlusions (glasses, hats), and varied camera angles. Consistent performance across demographic groups, with minimal disparity between protected characteristics. In practice, commercial deployment should align with automotive safety standards (e.g. ISO 26262, ISO 21448 – Safety of the Intended Functionality) and internal thresholds defined through rigorous validation testing. Regulators are likely to expect documented trade-offs between false positives and false negatives, rather than perfect accuracy.
Answered: 3 months, 1 week ago By: Miles_Hatcher
False negative
Answered: 3 months, 1 week ago By: Aminaolorun
False negative
Answered: 3 months, 1 week ago By: Clarawhitby
Only systems that demonstrate high accuracy, minimal missed detections, controlled false alarms, and compliance with safety standards should be approved for commercial use.
Answered: 3 months, 1 week ago By: Ifeanyiakare
For a system like IRIS, “high accuracy” is not sufficient. Commercial deployment should only be permitted
Answered: 3 months, 1 week ago By: Kunleekwueme
More data needs to be collected to correct the issues of false positives. An excellent level of regulation should he deployed into these areas where Artificial intelligence could make or mar people's lives.
Answered: 3 months, 1 week ago By: Sadeogunlana
Quality of materials/programs used, Quality Control
Answered: 3 months, 1 week ago By: Tomashbrook
For commercial deployment, the system should be required to be as accurate as possible. Deploying a software that has a high probability of making errors would be catastrophic to road safety and thr protection of customers/users of the software.
Deleuze replied: It should also be documented! The system should document, at minimum: drowsiness detection recall/sensitivity at KSS level 8 and above; false negative rate, because missed fatigue is safety-critical; false positive rate or false alerts per driving hour, because excessive false alarms reduce trust and may cause users to disable or ignore the system; time-to-warning once drowsiness is detected; performance across demographic groups, lighting conditions, occlusions, glasses, hats, skin tones, facial hair, and camera angles; worst-performing subgroup results, not only average results; confidence intervals and statistical significance; system availability, failure detection, and fallback behaviour; post-market monitoring results and drift over time.

Your Answer

Login to add your answer!

We’d love to hear your thoughts — share a meaningful answer by logging in.