Blog

Open source datasets for medical AI

Author Image
The Centaur Blogging Team
December 3, 2020

We live and breathe medical datasets for AI

Transforming healthcare through AI hinges upon access to diverse, richly annotated, open-source datasets. Medical AI applications—from image segmentation to multimodal reasoning—are only as robust as their training data. Fortunately, recent years have yielded high-quality, freely available datasets that researchers can leverage to build, test, and deploy impactful models. We thought it would be helpful to put some of our favorite open-source datasets in an organized list and share them with the community.


In our list, you can explore dozens of datasets by size, category, modality (including X-ray, ultrasound, Whole Slide Images, CT scans, ECGs), and more. Additionally, we have included a brief description that helps you to quickly understand the specific abnormalities of interest, the balance of the data, and information about annotations included, such as medical image classifications or segmentations.

Our collection of open source datasets for medical


Access the full collection here

If you know of any datasets that should be added to this list, please let us know.

Related posts

April 1, 2025

Case Study: Evaluating Biomedical LLMs with Top Research Experts

Collaborated with leading researchers to assess biomedical LLMs, advancing AI’s ability to answer medical queries and simplify complex scientific concepts.

Continue reading →
July 15, 2025

Is Competitive Gaming the Future of Healthcare AI Data Quality?

The healthcare industry generates vast unstructured data, making high-quality annotation vital for safe, effective AI. Gamification transforms repetitive labeling into competitive, engaging challenges that sharpen accuracy, sustain motivation, and reward excellence. By combining competition, feedback, and incentives, Centaur ensures data quality that fuels trustworthy healthcare AI breakthroughs.

Continue reading →
February 2, 2022

Disease prevalence and feedback in dermatology

A Centaur Labs study found that disease prevalence and expert feedback significantly influence diagnostic accuracy in dermatology, highlighting the need for contextual data and ongoing guidance to reduce errors and improve clinical decision-making.

Continue reading →