Blog

Unlocking Better AI with Cognitive-Inspired Data Engineering

Gunnar Epping, Research Scientist
February 20, 2025

At Centaur, we’re always seeking ways to improve data quality in medical AI. In a groundbreaking new study, “Improving Human and Machine Classification Through Cognitive-Inspired Data Engineering,” researchers have taken a significant step forward by addressing one of AI’s biggest hurdles: human bias in crowdsourced data.

The Problem: Human Bias in Machine Learning

Crowdsourcing platforms like Amazon Mechanical Turk (MTurk) and DiagnosUs are essential for rapidly labeling large datasets. However, these datasets can carry the biases of the annotators who provide the labels. This issue resonates in critical areas like medical diagnosis, where getting it right can be a matter of life and death. When we train machine learning models on data that reflects human biases—like overconfidence, wrong probability estimates, or deep-rooted systemic issues—it can undermine how well these models function. Awareness of these biases is crucial, as they can have serious consequences in real-world applications. This is particularly concerning in critical domains like medical diagnosis, where accuracy is paramount.

The question is: How can we reduce this bias to create more accurate and reliable data?

The Solution: Cognitive-Inspired Data Engineering

The study leverages a technique called recalibration, which adjusts subjective probability judgments made by human annotators. The researchers use a model called the Linear Log Odds (LLO) function to transform biased judgments into more objective data. This process is part of cognitive-inspired data engineering, which applies cognitive science principles to improve data quality and, by extension, ML model performance.

The Core Idea Behind Recalibration

Cognitive science tells us human judgment is often flawed, particularly when assigning probabilities to uncertain events. For example, people tend to be overconfident in their classifications or systematically underweight rare events. Models like the LLO function can help improve our guesses about probabilities. This means we can get more reliable labels to train our machine-learning models, leading to better overall results.

What the Study Found

To test this approach, the research team conducted two experiments to evaluate how recalibration affects data quality and machine learning model performance. The two experiments used the exact same set of images.

Experiment 1: Novice Annotators on MTurk

  1. Task: Novice participants labeled medical images, such as peripheral blood cells.
  2. Process: Participants provided probability-based answers that were recalibrated using the LLO function.
  3. Results:
    • Recalibrated crowd labels improved overall accuracy from 81.6% to 85.1%.
    • The recalibration process successfully reduced overconfidence and systematic biases.
    • However, recalibration had a more negligible effect on individual classification accuracy.

Experiment 2: Skilled Annotators on DiagnosUs

  1. Task: Skilled medical annotators labeled the same type of images.
  2. Process: Their responses were also subjected to recalibration using the LLO function.
  3. Results:
    • The accuracy of crowd labels improved from 88.3% to 96.7%.
    • The impact of recalibration was much more pronounced for skilled annotators than for novices.
    • This suggests that recalibration is particularly effective in high-expertise domains like medical AI.

Key Insights from the Study

  1. Recalibration Works: By adjusting probability judgments, researchers significantly improved the accuracy of crowdsourced labels.
  2. Efficiency Gains: More judgments typically lead to higher accuracy, but recalibrated labels reached optimal accuracy with fewer annotations than raw labels.
  3. Better ML Training Data: Models trained on recalibrated datasets outperformed models trained on non-recalibrated data, especially when the number of judgments was low. This is particularly relevant in real-world applications where annotations are costly and time-consuming.

Why This Matters for AI and Medical Applications

The findings of this study have far-reaching implications for the future of machine learning, particularly in domains where data accuracy is critical. Medical AI systems that learn from crowdsourced data can significantly benefit from recalibration techniques. These improvements can lead to more accurate diagnostic tools, better decision-support systems for healthcare professionals, and, ultimately, healthier patient outcomes. By refining these systems, we can enhance their effectiveness and reliability, making a real difference in people's lives.

1. More Reliable Medical Diagnoses

AI models used for detecting diseases, interpreting radiology scans, and classifying pathology images rely heavily on labeled training data. If those labels contain systematic biases, the model will inherit and amplify those errors. Recalibration helps mitigate this issue by refining the labels before training even begins.

2. More Efficient Use of Annotators

Medical professionals’ time is valuable. If we can achieve higher-quality labeled data with fewer annotations, we can reduce annotation costs while maintaining or improving ML models' quality. This efficiency is crucial for startups and research teams operating under resource constraints.

3. Reducing Bias in Other High-Skill Domains

While this study focuses on medical AI, the principles of cognitive-inspired data engineering apply broadly to other fields, including:

  • Financial risk modeling (reducing cognitive biases in credit assessments)
  • Legal AI applications (improving document classification and case law research)
  • Autonomous vehicles (refining human-annotated driving behavior datasets)
  • Defense and security (enhancing intelligence analysis through bias reduction)

The Future of Cognitive-Inspired Data Engineering

This study represents a major step forward in tackling bias in AI training data, but it also raises further questions for future research:

  1. Can we develop even more sophisticated recalibration models beyond LLO?
  2. How do different cognitive biases affect labeling accuracy across various fields?
  3. Can active learning techniques be combined with recalibration to optimize data collection further?
  4. What ethical considerations arise when modifying human-provided labels?

Conclusion

Cognitive-inspired data engineering is an innovative method that boosts the reliability of labels gathered from crowdsourcing. This significantly enhances the performance of machine learning models that rely on these labels. We can systematically reduce biases by leveraging techniques like recalibration, leading to more efficient data collection, better model performance, and more accurate AI applications.

At Centaur, we believe in the transformative power of high-quality data. Integrating cognitive science into data engineering will be essential for unlocking its full potential—especially in critical fields like medical diagnostics as AI evolves.

For AI to truly benefit humanity, it must be trained on data that objectively reflects reality. Cognitive-inspired data engineering helps make this possible.


Related posts

February 2, 2022

Just published: effects of disease prevalence and feedback on dermatological decision making

Our research collaboration with Dr. Jeremy M Wolfe just published in Cognitive Research: Principles and Implications.

Continue reading →
July 8, 2021

Centaur Labs teams up with Brigham and Women's Hospital on Massachusetts Life Sciences funded project

Learn more about how Centaur Labs is working with the Brigham and Women's Hospital team to develop multiple AI applications for point of care ultrasound.

Continue reading →