Karl Swanson

Karl Swanson

UCSF

Hypernym Substitution for the Simplification of Biomedical Definitions

Phenomenon: The vocabulary of clinical notes has limited the OpenNotes endeavor to increase transparency and engage patients. We are developing a pipeline to bridge this gap between medical record availability and comprehension. We used three approaches to perform hypernym substitution for words and word phrases considered complex and biomedical. Approach 1: The first approach used a combination of NLP models plus programmatic rules to perform the task of complex biomedical word and word phrase identification and substitution, respectively. For identification, we used a seq2seq model that performs complex word identification and a model for named entity recognition of biomedical terms. If a word or word phrase satisfies the condition of being both complicated and biomedical, it is then substituted for an N-gram matched hypernym within our custom hypernymy tree. Approach 2 and 3: We fine-tuned two models, T5 and GPT-J, with input as a biomedical definition, and the gold-standard output is a biomedical definition with manual hypernym substitutions. We applied our approaches to 1,000 randomly sampled biomedical definitions in the Unified Medical Language System (UMLS). Readability metrics, Flesch-Kincaid (FK) Score, FK Grade, Automated Readability Index (ARI), and Gunning-Fog Index (GFI), were evaluated for T-tests were evaluated for preprocessed and post-processed definitions. All approaches showed an increase in FK score and a reduction in required grade by index. The NLP plus programmatic approach and GPT approach showed the most improvement in score and reduction in required reading grade level by index. These showed an average grade level reduction from preprocessed definitions as collegiate, or higher, to a late grammar school or early high school reading level on average. Despite the improvement in readability metrics for the NLP plus programmatic approach, it was apparent that the qualitative understandability of the output was poor for human readers. The GPT approach did not suffer this qualitative loss. Our next steps include evaluating these two approaches with the Measure of Textual Lexical Diversity and the Mean Dependency Distance for syntactic complexity. We hypothesize that differences in these metrics between approaches may explain the differences in human understandability. After completing this work, we will pursue human validation studies for the entire pipeline.

Bio: Karl Swanson is a Clinical Informatics fellow at UCSF. He was trained during residency in internal medicine and works as a hospitalist during his clinical time at UCSF medical center. Prior to starting his medical training he completed a Master of Science in computational biology and has since continued to work on data science adjacent projects. His interest now lies in developing, applying and researching natural language tools in the clinical and public health domains.