Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Copyright © 2025. All rights reserved by Centaur Labs.
Blog
The future of AI depends not just on more innovative models but on better data. That includes data that is clinically grounded, linguistically precise, and validated by domain experts. Our recent collaboration with Microsoft Research and the University of Alicante exemplifies this vision in action.
Together, these teams have released PadChest-GR, the world’s first multimodal, bilingual, sentence-level dataset for grounded radiology reporting. This pioneering dataset aligns structured clinical text with annotated chest X-ray imagery, enabling machine learning models to justify each diagnostic claim with an interpretable visual reference—a step change in transparency and reliability.
Most medical imaging datasets to date have supported image-level classification—e.g., labeling a chest X-ray as “showing signs of cardiomegaly” or “no abnormalities detected.” While useful, these models often lack transparency. They are prone to “hallucinations”, where generated reports fabricate findings unsupported by the image or fail to specify where a pathology is located. Grounded radiology reporting takes a different approach:
This approach requires a fundamentally different type of dataset—one where each radiological observation is not only labeled but also grounded in a specific part of the image and expressed in natural language.
To create such a dataset, high-quality annotations are non-negotiable. That’s where Centaur.AI came in. Our HIPAA-compliant labeling platform enabled a team of trained radiologists at the University of Alicante to:
Unlike generic platforms, Centaur.AI was designed from the ground up for medical-grade annotation workflows. We support:
These features allowed the research team to focus on complex medical edge cases without sacrificing annotation throughput or data integrity.
PadChest-GR builds on the original PadChest dataset but adds critical new dimensions:
This enables more than classification; it supports explainable AI, localized report generation, and the training and testing of model factuality—all essential components in the safe deployment of AI in radiology.
In a clinical setting, a model’s ability to “explain itself” is more than a UX feature—it’s a safety imperative. Physicians need to know not just what an AI system says, but why it says it. Grounded reporting enables clinicians to verify that the AI is referencing the correct part of the image, thereby reducing the likelihood of generating hallucinated or clinically implausible findings. By collaborating with Microsoft Research and the University of Alicante on PadChest-GR, Centaur.AI helped support the type of data curation pipeline that supports this level of accountability and interpretability.
As noted in Microsoft Research’s announcement, Centaur.AI was a “significant enabler” of this work. We’re proud of that recognition—but more importantly, we’re proud of what it enables for the field at large.
As multimodal, multilingual, and clinically grounded AI systems become more common, the infrastructure for generating and validating high-quality data must keep pace. Centaur.AI is committed to meeting that challenge by:
This is what it looks like to operationalize responsible innovation in healthcare AI.
We’re honored to have played a part in PadChest-GR. We are excited about what it signals: a future where AI doesn’t just interpret medical images but does so transparently, accurately, and in full partnership with clinical expertise.
Learn PADChest GR, a new CXR dataset for GenAI by Microsoft Research & University of Alicante, developed with Centaur Labs' expert support.
Centaur Labs’ latest study tackles human bias in crowdsourced AI training data using cognitive-inspired data engineering. By applying recalibration techniques, they improved medical image classification accuracy significantly. This approach enhances AI reliability in healthcare and beyond, reducing bias and improving efficiency in machine learning model training.
How Centaur Labs leverages multiple expert opinions to create the most accurate medical data labeling platform for text, image and video data