Blog

Richer Medical Data Labels by knowing the power of metadata

Author Image
The Centaur Blogging Team
October 20, 2020

Understand how Centaur Labs' data annotation platform offers richer results than traditional data labeling vendors to improve your medical AI pipeline

With the demand for medical AI increasing rapidly, so has the need for accurate and scalable medical data labeling solutions. In recent years, three models for acquiring medical data labels have emerged:

  1. Hire board-certified physicians for $200/hour or more
  2. Recruit medical students and residents for tasks that do not require as much skill
  3. Work with teams of off-shore labelers that offer a ‘hybrid team’ consisting of one medical expert supported by a team of non-medical staff

These traditional models often fail to deliver highly accurate results. What’s more, the reasoning behind the labeling results is somewhat of a ‘black box’ because no information about labels like confidence and case difficulty is provided, which makes QCing data labels extremely difficult.


In contrast, labels produced through Centaur Labs’ collective intelligence approach can not only provide high accuracy but also provide deep insights into the labels produced. The additional signal produced by collecting and aggregating multiple opinions from experts opens the door to several beneficial use cases.


Flags improve accuracy of existing data labels

If you have a dataset with labels of uncertain quality because they were produced by a single expert, an outsourced labeling vendor, or pre-labeled by an AI model, we can analyze each label, and our network of experts will flag any that are incorrect. This use case was inspired by the joke among radiologists where “every time a radiologist looks at a scan, they find a new nodule."


Difficulty scores measure labeler agreement and case controversy

Since we collect multiple expert opinions for each case, we are able to develop a much richer understanding of the quality of the data. One particular insight we can measure is the level of agreement/disagreement between our experts for a specific case.

Difficulty scores are reported on our platform for each case labeled:

Richer Medical Data Labels

Clients use the difficulty score in a few different ways, depending on their specific situation and goals:


  1. Incorporate difficulty as a feature in their model. By informing the model about which cases are difficult, our clients have realized improved model performance.
  2. Identify cases that are of poor quality. Many times images will be blurry or distorted, or, in the case of audio files, contain a lot of background noise. These poor-quality cases can either be edited so they can be utilized or simply discarded
  3. Identify cases that are highly controversial. As is common in medicine, there are times when experts will disagree. For specific cases that are hard to judge, clients can opt to get even more expert opinions and include the case if consensus can be reached or discard it if not.

Label-level data for individual reads

Lastly, we provide customers with granular information about each labeler read. We share the number of votes for each answer choice in the case of classification or coordinates for each annotation in the case of segmentation. 

We also gather individual labeler accuracy scores. We know the correct answer to some portion of the cases viewed by each labeler, so we’re able to calculate individual accuracy at any given task.

Getting multiple opinions doesn’t just give you more accurate answers; it also gives you more insight into your data that can be used to optimize your data pipeline and improve the accuracy of your models.

Schedule a demo with Centaur.ai

To learn more, see the following:

  1. The Collective Intelligence Methodology
  2. Trusted, Accurate Medical Data Labels for AI
  3. Annotating Complex Data at Scale

Related posts

May 16, 2024

VUNO FDA Clearance Case Study | Brain MRI AI | Centaur AI

Collaborated with VUNO to annotate brain MRI data, contributing to FDA clearance for VUNO Med®-DeepBrain®, an AI tool designed to assist in early dementia detection.

Continue reading →
October 13, 2025

Supply Chain AI: Quality Annotation Foundation | Centaur AI

Supply chains run on data, but manual entry creates errors that block automation and weaken AI. Annotated documents deliver structured, high-quality data ready for both workflow automation and LLM training. With Centaur.ai, businesses achieve faster approvals, reliable compliance, and datasets that power predictive, AI-driven supply chains.

Continue reading →
May 30, 2025

Webinar: Expert Feedback in Healthcare AI | Centaur AI

Expert feedback is essential for safe, effective healthcare AI, as emphasized in a Centaur Labs webinar featuring leaders from Google Health, PathAI, and Centaur.

Continue reading →