Blog

Natural Language Processing in Healthcare

Author Image
The Centaur Blogging Team
August 19, 2022

The volume of medical text data, especially digitized and unstructured narrative text, has taken off with the adoption of EHRs and other digital tools used by clinicians, researchers, and patients. All of this medical text is fueling a surge of NLP model development to turn those medical text datasets into insights that impact care delivery, biomedical research, and much more. As every healthcare, medical device, and pharmaceutical company deepens their investments in AI, we’re seeing a growing number of clients developing NLP and an increased need for medical text annotations.

What is NLP, and how is it being applied in healthcare?

Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. It combines computational linguistics with machine learning and deep learning models to process and analyze large amounts of natural language data—such as text or speech. NLP is being applied in most parts of the healthcare value chain.

From accelerating subject recruitment in clinical trials to supporting decisions and diagnoses at the point of care to helping consumers find the accurate health information they need, there’s a lot of impact to be excited about, and we’ll give many detailed examples in a later post in this series.

What is medical text?

At Centaur Labs we use ‘medical text’ as shorthand for multiple types of health-related data stored as text. Medical text encompasses 3 subcategories of text: clinical text, biomedical text, and ‘other’ health text. 

Clinical text

Clinical text is defined as text collected during the course of ongoing patient care in the formal healthcare system or as part of a formal clinical trial program. We often think of clinical text as written by clinicians themselves, for example, when making clinical notes in the electronic health record (EHR), in a prescription write-up, or in a pathology report.

However, patients also generate an increasing volume of clinical text. They may describe the beginning of their healthcare journey, i.e., a discussion with a chatbot automating part of patient intake or a typical in-person clinical visit with a therapist. Patients may also describe the middle or end of their healthcare journey, i.e., a transcript of a follow-up call with a clinician, an insurance claim, or a discharge description.

Clinical text can also be written by others in the healthcare ecosystem who participate in patient care, such as insurance claims processors or family members. 

Biomedical text

Biomedical text is any data stored as text collected or created throughout the course of medical research. Often this text is written by research scientists in the form of scientific papers published in academic journals or as unpublished intellectual property. This unpublished text is owned and managed by the academic institution or company that employed the scientist and funded the research the paper summarizes.

Other health text

There are many types of health-related text generated outside patient care delivery and scientific research. For example, users of consumer wellness applications share health information in text formats as they interact with private groups or company ‘coaches’. Consumers also share health information in public forums like Twitter and Reddit. Government agencies publish public health-related text to their populations, and regulatory bodies publish guidelines and rulings to their constituents. Companies in the healthcare ecosystem train their workforces - whether clinicians or pharmaceutical sales reps - with platforms that deliver content containing health-related text. 

Medical text is difficult to interpret

These sub-categories of medical text have one thing in common - they contain health concepts and terminology and subtle relationships between words and phrases that are difficult for an untrained eye to identify. Those with interest in medicine, healthcare, and biology - whether a seasoned clinician, a medical student, or a medical enthusiast - are best suited to interpret medical text. In this blog series, we’ll focus on NLP that leverages medical text - whether clinical, biomedical or ‘other’ - as this is where some of the most compelling innovation is taking place. 

For additional insights, see the following posts:

  1. Data Labeling for Startups: Overcoming Common Challenges
  2. Enhancing AI Models with Medical Data Annotation

Related posts

December 14, 2022

Consensus boosts search algorithm with Centaur Lab's annotations

Centaur Labs contributes high-quality data annotations to enhance Consensus’ scientific search algorithm, improving accuracy and boosting research capabilities.

Continue reading →
August 31, 2022

9 most common types of medical text datasets

From SMS to insurance claims, pathology reports, and scientific studies, this post explores the most common medical text datasets used for NLP in healthcare.

Continue reading →
July 26, 2024

Gamified Data Labeling Boosts Eight Sleep Model Accuracy to 93%

Gamified data labeling enhances model accuracy from 70% to 93% in a case study with Eight Sleep, demonstrating the effectiveness of multimodal annotation.

Continue reading →