Blog

Open Source Datasets for Medical AI

Author Image
The Centaur Blogging Team
December 3, 2020

We live and breathe medical datasets for AI

Transforming healthcare through AI hinges upon access to diverse, richly annotated, open-source datasets. Medical AI applications—from image segmentation to multimodal reasoning—are only as robust as their training data. Fortunately, recent years have yielded high-quality, freely available datasets that researchers can leverage to build, test, and deploy impactful models. We thought it would be helpful to put some of our favorite open-source datasets in an organized list and share them with the community.


In our list, you can explore dozens of datasets by size, category, modality (including X-ray, ultrasound, Whole Slide Images, CT scans, ECGs), and more. Additionally, we have included a brief description that helps you to quickly understand the specific abnormalities of interest, the balance of the data, and information about annotations included, such as medical image classifications or segmentations.

Our collection of open source datasets for medical


Access the full collection here

If you know of any datasets that should be added to this list, please let us know.

Schedule a demo with Centaur.ai

Related posts

November 10, 2025

Visit Centaur AI at RSNA 2025 | Radiology Conference

Radiology AI models are only as strong as their annotations. Centaur.ai engineers quality through collective intelligence, combining expert crowds, benchmarking, and performance-based incentives to produce validated data for model training and evaluation. Visit our RSNA booth to see how we make radiology AI accuracy inevitable at scale.

Continue reading →
March 5, 2026

From Prediction Markets to Production-Ready Data

Centaur.ai CEO Erik Duhaime joined SegMed’s Bites of Innovation to explain how healthcare AI teams can achieve data quality at scale. He discusses collective intelligence, why credentials do not guarantee labeling quality, competitive annotation, dynamic escalation using disagreement, confidence scoring, continuous QC, regulatory datasets, de-identification, and how to use LLMs without blind trust.

Continue reading →
August 1, 2022

Mayo Clinic Lucem Health Partnership | Centaur AI

Learn about our partnership with Mayo Clinic spin out Lucem Health, and how clinical AI development teams can access high quality medical data annotations at scale.

Continue reading →