Blog

Your annotation is good enough for R&D, but not for the FDA.

Author Image
Dyllon Johnson
March 16, 2026

Most teams building medical AI have robust annotation pipelines, credentialed reviewers, and real scientific standards. And for model development, that works. But the annotation built for iteration has a specific blind spot when it comes to regulatory defense, and it shows up in one place: what happens when your annotators disagree.

In most enterprise workflows, disagreements get resolved with a tiebreaker. A senior clinician picks one answer, the conflicting label disappears, and everyone moves on. totally fine for R&D.

The FDA sees that differently. When they review your training data, they're asking how you arrived at your ground truth labels. What was the consensus methodology? Who participated? How was disagreement documented and resolved? If the answer is "a senior person picked one," you've essentially let one person's opinion override the others and quietly deleted the disagreement. That doesn't read as a consensus to a reviewer. It reads as a shortcut.

This is where enterprise annotation breaks. The annotators are qualified, and the labels are probably correct, but the resolution process was never built to be auditable. The FDA doesn't need to prove your labels are bad. They just need to ask how you got them and not like the answer.

What holds up under that scrutiny looks like multi-expert consensus with documented methodology, credentialed reviewers whose qualifications map to the clinical domain, and annotation provenance that traces every label back to who created it and how disagreements got resolved. Most teams already have pieces of this. The gap is usually in how disagreement resolution gets recorded, if it gets recorded at all.

What doesn't hold up is the stuff that's still standard at many companies: single-annotator pipelines, non-credentialed workforces, tiebreaker workflows where one person's judgment quietly becomes your ground truth. These are R&D tools applied to a regulatory problem.

At Centaur.ai, we've now contributed to 10 FDA clearances across 6 companies, and a few of those companies have come back for second and third submissions. The thing that held across all of them is that consensus methodology and annotation provenance were treated as regulatory infrastructure from day one, not patched in before submission.

If you're building toward a submission and your annotation workflow still resolves disagreement by escalating to one person, that's the conversation worth having.


For a demonstration of how Centaur can facilitate your AI model training and evaluation with greater accuracy, scalability, and value, click here: https://centaur.ai/demo.

Related posts

August 26, 2024

MICCAI 2024: Crowdsourced Annotations Research | Centaur AI

Centaur Labs' crowdsourced annotations research, accepted at MICCAI 2024. Collaborating with Brigham and Women’s Hospital to advance medical AI.

Continue reading →
May 16, 2024

VUNO FDA Clearance Case Study | Brain MRI AI | Centaur AI

Collaborated with VUNO to annotate brain MRI data, contributing to FDA clearance for VUNO Med®-DeepBrain®, an AI tool designed to assist in early dementia detection.

Continue reading →
August 4, 2025

MedAESQA: Medical Question Answering Benchmark | Centaur AI

Centaur.ai provided clinicians who evaluated AI-generated medical answers for the NIH’s MedAESQA dataset, verifying each statement’s accuracy and citation support. This expert-in-the-loop process ensures reliable, evidence-based benchmarks for healthcare AI. The project reflects Centaur.ai’s mission to improve AI through human oversight in high-stakes, precision-critical environments like medicine.

Continue reading →