Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Copyright © 2025. All rights reserved by Centaur Labs.
Blog
Centaur.ai and Dandelion Health are partnering to give AI developers secure and ethical access to de-identified, annotated clinical data—including images, waveforms, and structured health records—representing over four million patients across two health systems so AI developers can build better AI products that improve patient health. Dandelion, a startup founded in 2020, is building the richest AI training dataset in the world. Their de-identified clinical data includes images, waveforms, streaming monitors, full-text notes, structured health records, and more. Today, Dandelion has over 4.5 million patient records from two integrated delivery networks (IDNs) that were specifically chosen for demographic and clinical diversity. This dataset will grow to 10-15 million patient records from five IDNs in the coming years. Al developers use Dandelion data to train new models, get FDA approval, and assess the performance of existing models to minimize unwanted bias.
Centaur.ai is the leading scalable data annotation platform for the medical and life sciences industries. The Centaur.ai platform has turned biomedical data annotation into a competitive sport, generating 2 million high-quality annotations weekly from a proprietary network of tens of thousands of doctors, medical students, and other professionals, all of whom compete on the gamified platform to annotate data most accurately. Centaur.ai annotates a wide variety of data, including unstructured clinical notes, scientific papers, radiographic images, pathology slides, auscultation audio files, and more. Centaur.ai will increase the richness of the Dandelion dataset by identifying and marking important features—for example, identifying lung nodules in a chest CT or tagging drugs mentioned in clinical notes. AI developers can then use these annotated datasets to train algorithms to detect the same findings in medical data the algorithms have not yet seen.
Such algorithms could not only improve clinical workflows but also help healthcare providers identify conditions that are difficult to detect. Dandelion’s point-in-time data (e.g., a patient's radiology scans or clinical notes) are linked to longer-term quantitative outcomes (e.g., 10-year mortality). Therefore, a mammography algorithm, for example, that is trained on Dandelion-Centaur.ai data could not only aid in the accuracy of the mammogram interpretation but also predict the 5-year risk of breast cancer.
Through this partnership, AI developers can now access annotated data that spans nearly every aspect of U.S. clinical data—including rare and acute conditions and every type of medical imaging (e.g., CTs, MRIs, X-rays, ultrasounds, and biopsies). The result is better, more ambitious AI products that account for bias, advance science, and improve patient health.
“Dandelion Health’s north star is to improve patient care by accelerating widespread adoption of accurate, trustworthy, and equitable AI in clinical practice. By partnering with Centaur.ai, we can better address the data access and data labeling bottlenecks currently slowing progress towards this mission,” says Elliot Green, co-founder and CEO of Dandelion Health.
"We’ve seen many clients - from multinational medical device companies to early-stage startups - struggle to get timely access to the data they need to build their AI models,” says Erik Duhaime, co-founder and CEO of Centaur Labs. ”We’re thrilled to partner with Dandelion Health to make it easier for clinical AI developers to get off the starting blocks more rapidly and get their models into production where they can impact patient care."
Learn the how to mitigate the impact of medical error in your data labeling pipeline by intelligently aggregating multiple expert opinions together
Expert feedback is essential for safe, effective healthcare AI, as emphasized in a Centaur Labs webinar featuring leaders from Google Health, PathAI, and Centaur.
Centaur.AI collaborated with Microsoft Research and the University of Alicante to create PadChest-GR, the first multimodal, bilingual, sentence-level dataset for grounded radiology reporting. This breakthrough enables AI models to justify diagnostic claims with visual references, improving transparency and reliability in medical AI.