Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Subscribe to our monthly newsletter
Copyright © 2025. All rights reserved by Centaur.ai
Blog

Researchers from Microsoft Research and the University of Alicante released PadChest-GR (Grounded-Reporting) in late 2024, an innovative dataset designed to improve the quality of generative AI models for chest X-ray (CXR) imaging. The team used the Centaur.AI platform to complete all the annotations for this dataset, and we’re thrilled to have been able to contribute to their success. This dataset is now available to researchers globally.
If Radiology Report Generation (RRG) aims to create free-text radiology reports from clinical images, Grounded Radiology Report Generation (GRRG) takes it a step further by including the localization of individual findings in the image. The 2024 paper introduces both the first model to demonstrate the power of GRRG (MAIRA-2) as well as the task of GRRG and the output of a "Grounded Radiology Report."
The MAIRA-2 research team defines a "grounded radiology report" as "a list of sentences from the Findings section [of a radiology report], each describing at most a single observation from the image(s), and associated with zero or more spatial annotations indicating the location of that observation if appropriate." An example of a "grounded radiology report" is below.

By spatially grounding radiological findings, AI teams will be able to more easily verify the quality of the draft radiology reports their models generate. This verification is essential, as model quality and explainability are critical to building both clinician and patient trust in AI, particularly in generative AI.
Today, there are many CXR image datasets that are labeled for diagnosis and finding classification tasks or that come with the associated text-based radiology reports for automated draft report generation. Some datasets also include spatial annotations to localize labels (for finding, anatomy, or device; e.g., ‘pneumothorax’) or single finding phrases, such as ‘left retrocardiac opacity.’
What has been missing - and needed - to enable AI teams to build GRRG models are datasets that have both the spatial annotations and the direct links to the complete sets of descriptive sentences from the findings.
PadChest-GR is the first manually curated dataset for Grounded Radiology Report Generation (GRRG). It includes:
We collaborated closely with researchers from Microsoft Research, the University of Alicante, and the rest of the team to ensure seamless annotation of this novel dataset. Radiologist annotators used our HIPAA-compliant annotation platform to complete all data annotation.
Annotation was completed in two stages:
For both stages, every study or finding was analyzed independently by two professionals. The frontal image was always displayed beside the prior image (when available) so findings regarding progression could be identified.

The development of PadChest-GR was also supported by Microsoft Research, the Department of Radiology at University Hospital Sant Joan d’Alacant, Universitat d'Alacant, MedBravo, and the University of Cambridge. The research was financially supported by the University of Alicante-Microsoft research collaboration, which is funded by Microsoft.
For a demonstration of how Centaur can facilitate your AI model training and evaluation with greater accuracy, scalability, and value, click here: https://centaur.ai/demo
You can read the complete preprint about the PadChest-GR dataset here:
PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation
Today, we’re getting to know Tom Gellatly, a Centaur Labs co-founder and the VP of engineering!
Multimodal social listening requires more than raw data. To truly understand brand health across text, image, and video, companies need high-quality annotated datasets. Centaur.ai combines synthetic, privacy-safe data with expert labeling to deliver precise, scalable insights that keep brands compliant, resilient, and prepared for real-time consumer sentiment shifts.
Learn more about how Centaur.ai is working with the Brigham and Women's Hospital team to develop multiple AI applications for point-of-care ultrasound.