Blog

Case Study: Evaluating lung nodule auto-segmentation for RYVER’s synthetic medical data solution

Ali Devaney, Marketing
April 25, 2025

Medical AI teams struggle to create class-balanced training and test datasets with real-world medical data alone. RYVER makes it easier by leveraging the most advanced generative AI techniques to provide high-quality synthetic medical data. Centaur helped RYVER evaluate its new auto-segmentation capability.  

Leveraging synthetic data to fill data gaps

Developing high-performing medical AI models requires training data that reflects the full range of cases encountered in the production environment. However, achieving this level of representation is a challenge. Datasets are often imbalanced, overrepresenting certain scanner manufacturers while lacking sufficient examples of rarer tumor presentations. 

Traditionally, addressing these gaps has been time-consuming and costly. Developers must either establish complex data-sharing agreements with hospitals or rely on medical data brokers who often can’t meet specialized needs at the required volume. These limitations are why more medical AI teams are turning to RYVER.

RYVER, a startup advised by the former president of the American College of Radiology, specializes in generating high-quality synthetic medical data to help AI teams overcome data scarcity. Initially focused on synthetic abdomen and thoracic CT datasets—particularly for oncology applications—RYVER intends to expand to additional modalities and disease areas over time.

At the heart of RYVER’s platform is a suite of advanced generative CT models pre-trained on diverse, high-quality medical imaging datasets from global partners. AI teams can connect their proprietary data, fine-tune RYVER’s models, and instantly generate rare or underrepresented cases quickly, cost-effectively, and securely. For teams lacking initial data, RYVER’s partner ecosystem helps source the necessary samples to get started.

The Challenge: Evaluating lung nodule auto-segmentations

Through their work providing synthetic lung CT datasets, the RYVER team consistently heard the same request from clients:

“Almost 100% of the time, our customers don’t just want the synthetic images—they also want pixel-level segmentations of the lesions of interest,” said Kathrin Khadra, RYVER’s technical co-founder.

To meet this need, RYVER developed models capable of generating high-quality synthetic CT scans and corresponding pixel-level segmentations and classifications of lesions. As they prepared to launch this new capability, they knew demonstrating the quality of these auto-segmentations would be key to gaining customer trust.

They decided to generate a synthetic dataset with auto-segmentations, and then work with an expert annotation company to segment the same dataset. Once the annotation was complete, they would calculate the overlapping DICE scores, benchmark the score to a well-regarded publicly available dataset, and then make the results available to their prospective clients. 

They began by training their generative model with the Lung Image Database Consortium (LIDC) dataset—a well-known collection of over 1,000 thoracic CT scans annotated by four experienced thoracic radiologists. LIDC is a standard reference in AI research for lung imaging, making it an ideal dataset for training and benchmarking.

After generating 500 synthetic lung CT image patches with auto-segmentations, the next step was to find an expert annotation provider to annotate the dataset. The team knew how resource-intensive it could be to source annotators, manage timelines, and ensure quality control. They set out to find an annotation provider who could deliver high-quality results quickly, with minimal lift from their lean team.

Our Solution: Building a high-quality lung nodule segmentation test dataset 

To create a robust test dataset for evaluating their model’s auto-segmentations, RYVER partnered with Centaur, the leading medical annotation platform. Centaur stood out for its unique collective intelligence methodology, which delivers multiple high-quality annotations per case as a standard process. Centaur also provides detailed quality metrics, including interrater agreement and agreement with gold standards, offering RYVER a clear way to substantiate the quality of the test dataset.

Speed was also critical to RYVER. With the ability to deliver thousands of segmentations per day, Centaur gave RYVER confidence that they could annotate the full dataset quickly, enabling them to complete their analysis in support of the upcoming product launch.

RYVER’s model generated 500 lung image patches containing nodules. Each patch consisted of 20 slices, and annotators were asked to draw a polygon around visible nodules.

Centaur delivered 10 qualified reads per case, collecting nearly 26,000 qualified reads and generating final annotations at a pace of 1,000 annotations per day. Quality metrics were strong across the board, with 76% agreement with Gold Standards and 84% interrater agreement.

The Result: Validating RYVER’s high-quality lung nodule auto-segmentations

The collaboration with Centaur gave RYVER the control and insight needed to evaluate their auto-segmentation model confidently. The Centaur platform streamlined the annotation process and empowered RYVER to monitor quality closely and iterate quickly.

“Finding great medical data annotators is really hard. With Centaur, it’s easy,” said Khadra. “You can guide the annotation process yourself, run small test batches, update instructions, and immediately see the impact. You can check annotation quality at any time and really stay in control, which is incredibly important for us given the importance of quality.”

RYVER compared the overlapping DICE scores between their model’s auto-segmentations and Centaur’s expert annotations to assess their model's performance. They then benchmarked that score to the LIDC dataset DICE score. The DICE scores were comparable, demonstrating that RYVER’s auto-segmentations were on par with expert consensus. The results of their analysis were compelling enough that the team decided to publish them. A full write-up of the study will be available as a preprint on arXiv in the coming months.

Unlock expert-quality annotations at scale with Centaur 

The RYVER team has an ambitious vision to expand its capabilities to meet the enormous demands of the medical AI ecosystem.

“Our vision is to build a suite of synthetic medical data models that support AI teams working across all imaging modalities—from brain MRIs to digital pathology—and across all disease areas,” said Khadra.

To bring this vision to life, RYVER knows expert-quality annotation will remain critical, from training generative medical data models with small datasets to validating model performance with high-quality test datasets. Centaur will continue to play a key role in this next phase, annotating multimodal data to support RYVER’s vision of making high-quality synthetic medical datasets available to every medical AI team.

Related posts

July 6, 2023

Aiberry builds explainable mental health AI with Centaur Labs' data

Teamed up with Aiberry to annotate a new video dataset for mental health AI, boosting emotion detection and improving depression screening accuracy.

Continue reading →
March 29, 2021

Our data-driven approach to QA

Medical assessments are rarely black and white. To handle the grey, we offer a rigorous, data-driven approach to QA.

Continue reading →