Blog

Open Source Datasets for Medical AI

Author Image
The Centaur Blogging Team
December 3, 2020

We live and breathe medical datasets for AI

Transforming healthcare through AI hinges upon access to diverse, richly annotated, open-source datasets. Medical AI applications—from image segmentation to multimodal reasoning—are only as robust as their training data. Fortunately, recent years have yielded high-quality, freely available datasets that researchers can leverage to build, test, and deploy impactful models. We thought it would be helpful to put some of our favorite open-source datasets in an organized list and share them with the community.


In our list, you can explore dozens of datasets by size, category, modality (including X-ray, ultrasound, Whole Slide Images, CT scans, ECGs), and more. Additionally, we have included a brief description that helps you to quickly understand the specific abnormalities of interest, the balance of the data, and information about annotations included, such as medical image classifications or segmentations.

Our collection of open source datasets for medical


Access the full collection here

If you know of any datasets that should be added to this list, please let us know.

Schedule a demo with Centaur.ai

Related posts

September 3, 2021

Centaur AI Series A Funding Announcement

We are so humbled and excited to share our recent $15M Series A funding round led by Matrix Partners!

Continue reading →
March 30, 2023

SOC 2 Type II Certification | Centaur AI Security

Centaur.ai completes SOC 2 Type II audit, reinforcing its commitment to data security, privacy, and operational excellence for customers and partners.

Continue reading →
June 21, 2025

Minimize Bias in Medical AI: Data Curation Practices | Centaur AI

Emphasized the importance of data curation practices in reducing bias in medical AI, promoting diverse datasets, expert collaboration, and fairness metrics for more equitable outcomes.

Continue reading →