Blog


— Via AIMed
The life-saving reason why medical annotation company Centaur Labs wants us all to start analyzing medical images
In 1987, Jack Treynor, Finance Professor at the University of Southern California, conducted an experiment with his class in an effort to prove market efficiency. Treynor asked each of his 56 students to estimate the number of jelly beans inside a jar. The jar contained 850 beans, and the median answer Treynor’s students gave was 870.
Only one student managed to give an estimate that was closer to the true value than the group median. The experiment became a classic example of the wisdom of crowds, where the average answer obtained from a group of individuals tends to be more accurate than the answers of the individuals themselves.
Erik Duhaime, Co-Founder and CEO of medical data labeling company Centaur.ai, knew the experiment. But when he studied it again during his PhD at the MIT Center for Collective Intelligence, he saw another potential use. At the center, researchers look at how humans and computers can be better connected to become more powerful.
“I was partly inspired by my wife, who was attending medical school and residency at that time,” says Duhaime. “My PhD research focused on how to aggregate the opinions of multiple experts. Particularly, overcoming the challenge of making the wisdom of crowds work for certain tasks like analyzing medical images, where some people might have the professional knowledge and skills to do so, while others do not.”
Read the full article at AIMed »
Edge case detection enables robots to adapt to real-world variability in manufacturing, from lighting shifts to unexpected obstacles. By combining human annotation with AI training, Centaur.ai helps manufacturers reduce downtime, prevent defects, and build trust in automation. The result is safer, smarter, and more resilient robotic systems.
AI is fast, but accuracy remains the real barrier to safe deployment. This post explains how poor data quality, collapsed expert disagreement, and weak evaluation create false confidence in production AI. It shows how collective intelligence, gold-standard labeling, and human-in-the-loop workflows at Centaur.ai build auditable, high-accuracy datasets for high-stakes applications.
Centaur.ai provided clinicians who evaluated AI-generated medical answers for the NIH’s MedAESQA dataset, verifying each statement’s accuracy and citation support. This expert-in-the-loop process ensures reliable, evidence-based benchmarks for healthcare AI. The project reflects Centaur.ai’s mission to improve AI through human oversight in high-stakes, precision-critical environments like medicine.