Blog

The Centaur Blogging Team

August 19, 2022

The volume of medical text data, especially digitized and unstructured narrative text, has taken off with the adoption of EHRs and other digital tools used by clinicians, researchers, and patients. All of this medical text is fueling a surge of NLP model development to turn those medical text datasets into insights that impact care delivery, biomedical research, and much more. As every healthcare, medical device, and pharmaceutical company deepens their investments in AI, we’re seeing a growing number of clients developing NLP and an increased need for medical text annotations.

‍

What is NLP, and how is it being applied in healthcare?

‍

Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. It combines computational linguistics with machine learning and deep learning models to process and analyze large amounts of natural language data—such as text or speech. NLP is being applied in most parts of the healthcare value chain.

‍

From accelerating subject recruitment in clinical trials to supporting decisions and diagnoses at the point of care to helping consumers find the accurate health information they need, there’s a lot of impact to be excited about, and we’ll give many detailed examples in a later post in this series.

‍

What is medical text?

‍

At Centaur Labs we use ‘medical text’ as shorthand for multiple types of health-related data stored as text. Medical text encompasses 3 subcategories of text: clinical text, biomedical text, and ‘other’ health text.

‍

‍Clinical text

‍

Clinical text is defined as text collected during the course of ongoing patient care in the formal healthcare system or as part of a formal clinical trial program. We often think of clinical text as written by clinicians themselves, for example, when making clinical notes in the electronic health record (EHR), in a prescription write-up, or in a pathology report.

‍

However, patients also generate an increasing volume of clinical text. They may describe the beginning of their healthcare journey, i.e., a discussion with a chatbot automating part of patient intake or a typical in-person clinical visit with a therapist. Patients may also describe the middle or end of their healthcare journey, i.e., a transcript of a follow-up call with a clinician, an insurance claim, or a discharge description.

‍

Clinical text can also be written by others in the healthcare ecosystem who participate in patient care, such as insurance claims processors or family members.

‍

Biomedical text

‍

Biomedical text is any data stored as text collected or created throughout the course of medical research. Often this text is written by research scientists in the form of scientific papers published in academic journals or as unpublished intellectual property. This unpublished text is owned and managed by the academic institution or company that employed the scientist and funded the research the paper summarizes.

‍

Other health text

‍

‍There are many types of health-related text generated outside patient care delivery and scientific research. For example, users of consumer wellness applications share health information in text formats as they interact with private groups or company ‘coaches’. Consumers also share health information in public forums like Twitter and Reddit. Government agencies publish public health-related text to their populations, and regulatory bodies publish guidelines and rulings to their constituents. Companies in the healthcare ecosystem train their workforces - whether clinicians or pharmaceutical sales reps - with platforms that deliver content containing health-related text.

‍

Medical text is difficult to interpret

‍

These sub-categories of medical text have one thing in common - they contain health concepts and terminology and subtle relationships between words and phrases that are difficult for an untrained eye to identify. Those with interest in medicine, healthcare, and biology - whether a seasoned clinician, a medical student, or a medical enthusiast - are best suited to interpret medical text. In this blog series, we’ll focus on NLP that leverages medical text - whether clinical, biomedical or ‘other’ - as this is where some of the most compelling innovation is taking place.

‍

Schedule a demo with Centaur.ai

‍

To learn more, see the following:

‍

December 14, 2022

Consensus Scientific Search Case Study | Centaur AI

Centaur Labs contributes high-quality data annotations to enhance Consensus’ scientific search algorithm, improving accuracy and boosting research capabilities.

August 15, 2024

Time Range Selection for Medical Video Annotation | Centaur AI

Know Centaur AI's new time range selection feature that speeds up medical video annotation, improving accuracy and efficiency in healthcare data processing.

October 23, 2025

Social Listening Annotation for Brand Health | Centaur AI

Multimodal social listening requires more than raw data. To truly understand brand health across text, image, and video, companies need high-quality annotated datasets. Centaur.ai combines synthetic, privacy-safe data with expert labeling to deliver precise, scalable insights that keep brands compliant, resilient, and prepared for real-time consumer sentiment shifts.