Blog
In the race to deploy AI in healthcare, the bottleneck isn’t always the model architecture or computing power—it’s the quality of data and the feedback loop behind it. In a recent Centaur Labs webinar, leaders from Google Health, PathAI, and Centaur Labs came together to discuss why expert feedback is essential to building effective, safe, and trustworthy healthcare AI systems.
Erik Duhaime, CEO of Centaur Labs, opened by framing the conversation: “You can’t have safe AI if you don’t have the data to measure whether it’s safe.” That simple truth—so often overlooked in the excitement around model performance—underscores the importance of feedback throughout the AI development lifecycle.
As AI models continue to influence decisions in radiology, pathology, and diagnostics, the need for high-quality training and validation data has never been greater. Google Health and PathAI shared how they incorporate clinicians and domain experts not just at the labeling stage but throughout model validation and post-deployment.
Duhaime emphasized the pitfalls of relying solely on “ground truth” as a static concept. “There’s a lot of subjectivity; there’s a lot of disagreement,” he said. “If you’re going to claim that your model is better than a radiologist or pathologist, you need a better benchmark.”
Centaur Labs, founded on the premise that medical AI requires collective intelligence, has built a platform that allows developers to harness the judgment of thousands of medical professionals. “What’s powerful about collecting a wide variety of opinions,” Duhaime explained, “is that you can measure disagreement and use that as a proxy for uncertainty.”
This has broad implications—not just for training data, but for understanding model blind spots and ensuring safe deployment. “You can identify edge cases,” he continued, “and figure out where the model needs more examples or where experts don’t agree.”
The process isn’t just about annotation, either. It’s about iteration. “It’s not just labels that matter—it’s feedback,” said Duhaime. That feedback loop helps teams continuously refine models, improving generalizability and reducing risk.
A particularly important moment in the discussion came when Duhaime challenged the industry’s traditional view of annotation accuracy. “We tend to treat disagreement as noise, but a lot of times, disagreement is a signal,” he said. “It tells you that the case is hard, that the data is ambiguous, or that there’s no clinical consensus.”
This recognition has led Centaur Labs to invest heavily in infrastructure that allows clients to collect and analyze diverse expert input at scale. Rather than relying on one person’s opinion, they can see how 20 or more professionals weigh in—helping them make more informed decisions.
As the discussion wrapped, Duhaime reiterated a core belief at the heart of Centaur Labs: “AI should not be trained and evaluated in isolation.” Human expertise isn’t just a helpful addition—it’s a requirement for safe, effective, and ethical medical AI.
For healthcare organizations looking to accelerate AI initiatives, the takeaway was clear: model performance is only part of the story. Continuous, scalable expert feedback is what turns promising algorithms into reliable clinical tools.
Meet Centaur.ai at HIMSS 2026 in Las Vegas at booth #11222. See live demos of our collective intelligence platform that produces superhuman data for healthcare AI training, evaluation, and validation. Learn how higher-quality data improves model performance, reduces risk, and accelerates clinical AI deployment with measurable confidence.
AI models in climate and energy depend on accurate data, not just algorithms. Centaur.ai delivers expert-validated, edge-aware labels that adapt to shifting seasons, regions, and infrastructure. From tagging satellite imagery to QA-ing emissions outputs, our collective intelligence approach ensures more reliable insights for planet-scale challenges.
Gamified data labeling enhances model accuracy from 70% to 93% in a case study with Eight Sleep, demonstrating the effectiveness of multimodal annotation.