Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Copyright © 2025. All rights reserved by Centaur Labs.
Blog
In the race to deploy AI in healthcare, the bottleneck isn’t always the model architecture or computing power—it’s the quality of data and the feedback loop behind it. In a recent Centaur Labs webinar, leaders from Google Health, PathAI, and Centaur Labs came together to discuss why expert feedback is essential to building effective, safe, and trustworthy healthcare AI systems.
Erik Duhaime, CEO of Centaur Labs, opened by framing the conversation: “You can’t have safe AI if you don’t have the data to measure whether it’s safe.” That simple truth—so often overlooked in the excitement around model performance—underscores the importance of feedback throughout the AI development lifecycle.
As AI models continue to influence decisions in radiology, pathology, and diagnostics, the need for high-quality training and validation data has never been greater. Google Health and PathAI shared how they incorporate clinicians and domain experts not just at the labeling stage but throughout model validation and post-deployment.
Duhaime emphasized the pitfalls of relying solely on “ground truth” as a static concept. “There’s a lot of subjectivity; there’s a lot of disagreement,” he said. “If you’re going to claim that your model is better than a radiologist or pathologist, you need a better benchmark.”
Centaur Labs, founded on the premise that medical AI requires collective intelligence, has built a platform that allows developers to harness the judgment of thousands of medical professionals. “What’s powerful about collecting a wide variety of opinions,” Duhaime explained, “is that you can measure disagreement and use that as a proxy for uncertainty.”
This has broad implications—not just for training data, but for understanding model blind spots and ensuring safe deployment. “You can identify edge cases,” he continued, “and figure out where the model needs more examples or where experts don’t agree.”
The process isn’t just about annotation, either. It’s about iteration. “It’s not just labels that matter—it’s feedback,” said Duhaime. That feedback loop helps teams continuously refine models, improving generalizability and reducing risk.
A particularly important moment in the discussion came when Duhaime challenged the industry’s traditional view of annotation accuracy. “We tend to treat disagreement as noise, but a lot of times, disagreement is a signal,” he said. “It tells you that the case is hard, that the data is ambiguous, or that there’s no clinical consensus.”
This recognition has led Centaur Labs to invest heavily in infrastructure that allows clients to collect and analyze diverse expert input at scale. Rather than relying on one person’s opinion, they can see how 20 or more professionals weigh in—helping them make more informed decisions.
As the discussion wrapped, Duhaime reiterated a core belief at the heart of Centaur Labs: “AI should not be trained and evaluated in isolation.” Human expertise isn’t just a helpful addition—it’s a requirement for safe, effective, and ethical medical AI.
For healthcare organizations looking to accelerate AI initiatives, the takeaway was clear: model performance is only part of the story. Continuous, scalable expert feedback is what turns promising algorithms into reliable clinical tools.
Centaur partnered with Ryver.ai to rigorously evaluate the accuracy of their synthetic lung nodule segmentations. Using our expert-led validation framework, we found Ryver’s synthetic annotations performed on par with human experts—highlighting synthetic data’s growing role in medical AI development.
A new partnership between Protégé and Centaur Labs aims to accelerate AI development, driving innovation in healthcare and research technology.
Understand why traditional labeling pipelines are hard to scale—and discover how our solution can 10X your pipeline faster, with greater accuracy and efficiency.