Non-therapeutic Drug Use

Model Details

This model is designed to output a binary classification – 1 for yes and 0 for no – for the non-therapeutic use of a drug term. The drug for evaluation must be substituted with the term “DRUGTERM” before evaluation. It is fine-tuned for this use case from the model bert-based-cased (https://huggingface.co/google-bert/bert-base-cased). As such, it is case-sensitive; when used, do not convert the original text to all lowercase.

Author: Nikki Adams ([email protected])
Developed at: National Center for Health Statistics, Centers for Disease Control and Prevention
Model Type: Text Classification
Language(s): English
License: Apache-2.0

Example Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification 
import torch 
import pandas as pd 
import numpy as np 

# Load the model 

model_location = "NCHS/Non_Therapeutic_Drug_Use" 
model = AutoModelForSequenceClassification.from_pretrained(model_location) 
tokenizer = AutoTokenizer.from_pretrained(model_location) 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 

# Example texts with DRUGTERM substituted for 
#'ADDERALL', 'meth', 'oxycodone', and 'oxycodone', respectively 

texts = [ 
  "Amphetamine-dextroamphetamine (DRUGTERM) extended release 10 mg PRN, order placed", 
  "Presents with cardiac arrhythmia and hx of DRUGTERM overdose", 
  "Repeatedly told patient that they could not have their DRUGTERM rx filled again due to overuse and that they had to see their primary care physician", 
   "There does not seem to be any history of DRUGTERM misuse" 
] 

encoded_texts = tokenizer.batch_encode_plus(texts, add_special_tokens = True, 
              truncation = True, padding = True, 
            return_attention_mask = True, return_tensors = 'pt') 

with torch.no_grad(): 
    sample_output = model(encoded_texts['input_ids'].to(device), 
                          token_type_ids = None, 
                          attention_mask = encoded_texts['attention_mask'].to(device)) 

    sample_logits = sample_output.logits.detach().cpu().numpy() 
    classifications = np.argmax(sample_logits, axis = 1) 

df = pd.DataFrame({'NON_THERAPEUTIC': classifications, 
                   'TEXTS': texts}) 

print(df)

The above should output:

NON_THERAPEUTIC	TEXTS
0	Amphetamine-dextroamphetamine (DRUGTERM) extended release 10 mg PRN, order placed
1	Presents with cardiac arrhythmia and hx of DRUGTERM overdose
1	Repeatedly told patient that they could not have their DRUGTERM rx filled again due to overuse and that they had to see their primary care physician
0	There does not seem to be any history of DRUGTERM misuse

Uses

This model is intended to detect the non-therapeutic status of drugs in clinical notes. For this model, non-therapeutic was broadly defined as the use of an illicit drug, misuse of a prescription drug, or some indication of “dependence” or “abuse” (in line with labels of ICD-10-CM diagnosis description) of an unspecified drug. The model was fine-tuned as part of a larger project to detect non-therapeutic stimulant and opioid use in hospital encounters, including the clinical notes, so all text examples were found by searching for stimulant and opioid drug terms. The training data text came from the National Hospital Care Survey, as well as manually constructed or altered data to attempt to fill in data gaps.

For each found drug term, a span of text of approximately 70 characters on either side was taken.
Within this snippet of text, the drug term was evaluated for its non-therapeutic status. Approximately 600 labeled texts were used to train this model, and approximately 200 texts were used to evaluate. An approximately equal number of positive and negative cases were used in the training and evaluation. The model was fine-tuned from bert-base-cased over four epochs. Evaluation data gave precision (or positive predictive value), recall (or sensitivity), and F1-score (harmonic mean of precision and recall) on the positive class of 0.90, 0.93, and 0.92, respectively.

Though all the training data was found by searching for opioid and stimulant drug terms, the model should generally be usable on any drug type because the terms themselves were replaced with the word “DRUGTERM.”

It should be noted that the model detects non-therapeutic drug use; it does not detect therapeutic drug use. It should not be assumed that a classification of 0 for non-therapeutic use means the drug was used therapeutically or even that it was used. Possible interpretations of drug mentions that are not non-therapeutic include therapeutic use, lab screenings, and questionnaires.

Training Data:

The model was fine-tuned on 600 text snippets from the 2020 National Hospital Care Survey.

Training procedure

Learning rate: 2e-5
Batch size: 32
Number training epochs: 4
Number of labels: 2

NCHS
/

Non_Therapeutic_Drug_Use

Non-therapeutic Drug Use

Model tree for NCHS/Non_Therapeutic_Drug_Use