File size: 6,842 Bytes
a8c352a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import numpy as np
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from transformers import BertForSequenceClassification,BertTokenizer

import gradio as gr
from typing import Dict


num_labels=14
model = BertForSequenceClassification.from_pretrained("owaiskha9654/Multi-Label-Classification-of-PubMed-Articles", num_labels=num_labels)
tokenizer = BertTokenizer.from_pretrained('owaiskha9654/Multi-Label-Classification-of-PubMed-Articles', do_lower_case=True) 


def Multi_Label_Classification_of_Pubmed_Articles(model_input: str) -> Dict[str, float]:
    
    # tokenized = tokenizer.tokenize_and_pad(model_input)
    # Encoding input data
    Articles_test = list(model_input)
    test_encodings = tokenizer.batch_encode_plus(Articles_test,max_length=max_length,padding=True,truncation=True)
    test_input_ids = test_encodings['input_ids']
    test_attention_masks = test_encodings['attention_mask']
        # Make tensors out of data
    test_inputs = torch.tensor(test_input_ids)
    test_labels = torch.tensor(test_labels)
    test_masks = torch.tensor(test_attention_masks)
    # Create test dataloader
    test_data = TensorDataset(test_inputs, test_masks, test_labels,)# test_token_types)
    test_sampler = SequentialSampler(test_data)
    test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=batch_size)
    
    # Put model in evaluation mode to evaluate loss on the validation set
    model.eval()
    
    #track variables
    logit_preds,true_labels,pred_labels,tokenized_texts = [],[],[],[]
    
    # Predict
    for i, batch in enumerate(test_dataloader):
        batch = tuple(t.to(device) for t in batch)
        # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask, b_labels, = batch
        with torch.no_grad():
            # Forward pass
            outs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
            b_logit_pred = outs[0]
            pred_label = torch.sigmoid(b_logit_pred)
    
            b_logit_pred = b_logit_pred.detach().cpu().numpy()
            pred_label = pred_label.to('cpu').numpy()
            b_labels = b_labels.to('cpu').numpy()
    
        tokenized_texts.append(b_input_ids)
        logit_preds.append(b_logit_pred)
        true_labels.append(b_labels)
        pred_labels.append(pred_label)
    
    # Flatten outputs
    tokenized_texts = [item for sublist in tokenized_texts for item in sublist]
    pred_labels = [item for sublist in pred_labels for item in sublist]
    true_labels = [item for sublist in true_labels for item in sublist]
    # Converting flattened binary values to boolean values
    true_bools = [tl==1 for tl in true_labels]
    

    prediction = model.predict(tokenized)[0]
    ret = {
        "negative": float(prediction[0]),
        "positive": float(prediction[1])
    }
    return ret


model_input = gr.Textbox("Input text here", show_label=False)
model_output = gr.Label("Multi Label MeSH Result", num_top_classes=14, show_label=True, label="MeSH(Medical Subheadings) Labels assigned to this article")


examples = [
    (
        "Story of a man who has unnatural feelings for a pig. "
        "Starts out with a opening scene that is a terrific example of absurd comedy. "
        "A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. "
        "Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. "
        "Even those from the era should be turned off. "
        "The cryptic dialogue would make Shakespeare seem easy to a third grader. "
        "On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. "
        "Future stars Sally Kirkland and Frederic Forrest can be seen briefly."
    ),
    (
        "I came in in the middle of this film so I had no idea about any credits or even its title till I looked it up here, "
        "where I see that it has received a mixed reception by your commentators. "
        "I'm on the positive side regarding this film but one thing really caught my attention as I watched: "
        "the beautiful and sensitive score written in a Coplandesque Americana style. "
        "My surprise was great when I discovered the score to have been written by none other than John Williams himself. "
        "True he has written sensitive and poignant scores such as Schindler's List but one usually associates "
        "his name with such bombasticities as Star Wars. "
        "But in my opinion what Williams has written for this movie surpasses anything I've ever heard of his "
        "for tenderness, sensitivity and beauty, fully in keeping with the tender and lovely plot of the movie. "
        "And another recent score of his, for Catch Me if You Can, shows still more wit and sophistication. "
        "As to Stanley and Iris, I like education movies like How Green was my Valley and Konrack, "
        "that one with John Voigt and his young African American charges in South Carolina, "
        "and Danny deVito's Renaissance Man, etc. They tell a necessary story of intellectual and spiritual awakening, "
        "a story which can't be told often enough. This one is an excellent addition to that genre."
    )
]

title = "Multi Label Classification of Pubmed Articles"
description = "The traditional machine learning models give a lot of pain when we do not have sufficient labeled data for the specific task or domain we care about to train a reliable model. Transfer learning allows us to deal with these scenarios by leveraging the already existing labeled data of some related task or domain. We try to store this knowledge gained in solving the source task in the source domain and apply it to our problem of interest. In this work, I have utilized Transfer Learning utilizing BertForSequenceClassification model to fine tune on Pubmed MultiLabel classification Dataset."
article = (
    "Author: Owais Ahmad "
    "Model Trained Kaggle on <a href=\"https://www.kaggle.com/code/owaiskhan9654/multi-label-classification-of-pubmed-articles\">Link</a> "
    "Weight and Biases integration of Runs for Different models <a href=\https://wandb.ai/owaiskhan9515/Multi%20Label%20Classification%20of%20PubMed%20Articles%20(Paper%20Night%20Presentation)?\">Link</a>. "
    "HuggingFace Model Repo <a href=\"https://huggingface.co/owaiskha9654/Multi-Label-Classification-of-PubMed-Articles\">Link</a>"
)


app = gr.Interface(
    Multi_Label_Classification_of_Pubmed_Articles, 
	inputs=model_input,
    outputs=model_output, 
    examples=examples,
    title=title,
	description=description,
    article=article,
    allow_flagging='never',
    analytics_enabled=False,
)

app.launch(enable_queue=True)