Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Subscribe to our monthly newsletter
Copyright © 2025. All rights reserved by Centaur.ai
Blog

The future of AI depends not just on more innovative models but on better data. That includes data that is clinically grounded, linguistically precise, and validated by domain experts. Our recent collaboration with Microsoft Research and the University of Alicante exemplifies this vision in action.
Together, these teams have released PadChest-GR, the world’s first multimodal, bilingual, sentence-level dataset for grounded radiology reporting. This pioneering dataset aligns structured clinical text with annotated chest X-ray imagery, enabling machine learning models to justify each diagnostic claim with an interpretable visual reference—a step change in transparency and reliability.
Most medical imaging datasets to date have supported image-level classification—e.g., labeling a chest X-ray as “showing signs of cardiomegaly” or “no abnormalities detected.” While useful, these models often lack transparency. They are prone to “hallucinations”, where generated reports fabricate findings unsupported by the image or fail to specify where a pathology is located. Grounded radiology reporting takes a different approach:

This approach requires a fundamentally different type of dataset—one where each radiological observation is not only labeled but also grounded in a specific part of the image and expressed in natural language.
To create such a dataset, high-quality annotations are non-negotiable. That’s where Centaur.AI came in. Our HIPAA-compliant labeling platform enabled a team of trained radiologists at the University of Alicante to:
Unlike generic platforms, Centaur.AI was designed from the ground up for medical-grade annotation workflows. We support:
These features allowed the research team to focus on complex medical edge cases without sacrificing annotation throughput or data integrity.
PadChest-GR builds on the original PadChest dataset but adds critical new dimensions:
This enables more than classification; it supports explainable AI, localized report generation, and the training and testing of model factuality—all essential components in the safe deployment of AI in radiology.
In a clinical setting, a model’s ability to “explain itself” is more than a UX feature—it’s a safety imperative. Physicians need to know not just what an AI system says, but why it says it. Grounded reporting enables clinicians to verify that the AI is referencing the correct part of the image, thereby reducing the likelihood of generating hallucinated or clinically implausible findings. By collaborating with Microsoft Research and the University of Alicante on PadChest-GR, Centaur.AI helped support the type of data curation pipeline that supports this level of accountability and interpretability.

As noted in Microsoft Research’s announcement, Centaur.AI was a “significant enabler” of this work. We’re proud of that recognition—but more importantly, we’re proud of what it enables for the field at large.
As multimodal, multilingual, and clinically grounded AI systems become more common, the infrastructure for generating and validating high-quality data must keep pace. Centaur.AI is committed to meeting that challenge by:
This is what it looks like to operationalize responsible innovation in healthcare AI.
We’re honored to have played a part in PadChest-GR. We are excited about what it signals: a future where AI doesn’t just interpret medical images but does so transparently, accurately, and in full partnership with clinical expertise.
For a demonstration of how we can facilitate your AI model training and evaluation with greater accuracy, scalability, and value, Schedule a demo with Centaur.ai
To view the entire research paper, see here:
https://www.microsoft.com/en-us/research/blog/padchest-gr-a-bilingual-grounded-radiology-reporting-benchmark-for-chest-x-rays/
Paige collaborates with Centaur.ai to enhance its algorithm, using high-quality data annotations to boost accuracy and performance in breast cancer detection models.
Access dozens of open-source medical AI image datasets in formats like X-ray, CT, MRI, Ultrasound, Whole Slide Imaging, and more for research and training.
This blog post highlights how high-fidelity annotation determines the reliability of models in complex scientific and medical domains. It introduces a MedTech case study demonstrating Centaur.ai’s volumetric workflow, expert-driven review, and rigorous quality controls that enabled sub-millimeter cardiac segmentation for advanced simulation and AI training. Readers are encouraged to download the full case study.