Ihor commited on
Commit
8f0a5a8
·
verified ·
1 Parent(s): 2c72469

add examples with process function

Browse files
Files changed (1) hide show
  1. README.md +55 -16
README.md CHANGED
@@ -34,6 +34,55 @@ We recommend to use the model with transformers `ner` pipeline:
34
  from transformers import AutoTokenizer, AutoModelForTokenClassification
35
  from transformers import pipeline
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  tokenizer = AutoTokenizer.from_pretrained("knowledgator/UTC-DeBERTa-small")
38
  model = AutoModelForTokenClassification.from_pretrained("knowledgator/UTC-DeBERTa-small")
39
 
@@ -52,9 +101,7 @@ text = """Apple was founded as Apple Computer Company on April 1, 1976, by Steve
52
  It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company's second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers.
53
  Apple went public in 1980 to instant financial success."""
54
 
55
- input_ = f"{prompt} {text}"
56
-
57
- results = nlp(input_)
58
 
59
  print(results)
60
  ```
@@ -69,7 +116,7 @@ During his career at Microsoft, Gates held the positions of chairman, chief exec
69
 
70
  input_ = f"{question} {text}"
71
 
72
- results = nlp(input_)
73
 
74
  print(results)
75
  ```
@@ -83,9 +130,7 @@ text = """The mechanism of action was characterized using native mass spectromet
83
  Similarly, compound 23R showed dose-dependent stabilization of the SARS-CoV-2 Mpro in the thermal shift binding assay with an apparent Kd value of 9.43 μM, a 9.3-fold decrease compared to ML188 (1) (Figure B). In the enzymatic kinetic studies, 23R was shown to be a noncovalent inhibitor with a Ki value of 0.07 μM (Figure C, D top and middle panels). In comparison, the Ki for the parent compound ML188 (1) is 2.29 μM.
84
  The Lineweaver–Burk or double-reciprocal plot with different compound concentrations yielded an intercept at the Y-axis, suggesting that 23R is a competitive inhibitor similar to ML188 (1) (Figure C, D bottom panel). Buy our T-shirts for the lowerst prices you can find!!! Overall, the enzymatic kinetic studies confirmed that compound 23R is a noncovalent inhibitor of SARS-CoV-2 Mpro."""
85
 
86
- input_ = f"{prompt}\n{text}"
87
-
88
- results = nlp(input_)
89
 
90
  print(results)
91
  ```
@@ -107,9 +152,7 @@ relation = "worked at"
107
 
108
  prompt = rex_prompt.format(relation, entity)
109
 
110
- input_ = f"{prompt}\n{text}"
111
-
112
- results = nlp(input_)
113
 
114
  print(results)
115
  ```
@@ -125,9 +168,7 @@ entity = "anethole"
125
 
126
  prompt = ent_prompt.format(entity)
127
 
128
- input_ = f"{prompt}\n{text}"
129
-
130
- results = nlp(input_)
131
 
132
  print(results)
133
  ```
@@ -141,9 +182,7 @@ text = """Apple was founded as Apple Computer Company on April 1, 1976, by Steve
141
  Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the world's largest technology company by revenue, with US$394.3 billion in 2022 revenue. As of March 2023, Apple is the world's biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer in the world. It is considered one of the Big Five American information technology companies, alongside Alphabet (parent company of Google), Amazon, Meta Platforms, and Microsoft.
142
  As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as "Wintel"). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple's unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the "Think different" campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company's product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook"""
143
 
144
- input_ = f"{prompt}\n{text}"
145
-
146
- results = nlp(input_)
147
 
148
  print(results)
149
  ```
 
34
  from transformers import AutoTokenizer, AutoModelForTokenClassification
35
  from transformers import pipeline
36
 
37
+ def process(text, prompt, treshold=0.5):
38
+ """
39
+ Processes text by preparing prompt and adjusting indices.
40
+
41
+ Args:
42
+ text (str): The text to process
43
+ prompt (str): The prompt to prepend to the text
44
+
45
+ Returns:
46
+ list: A list of dicts with adjusted spans and scores
47
+ """
48
+
49
+ # Concatenate text and prompt for full input
50
+ input_ = f"{prompt}\n{text}"
51
+
52
+ results = nlp(input_) # Run NLP on full input
53
+
54
+ processed_results = []
55
+
56
+ prompt_length = len(prompt) # Get prompt length
57
+
58
+ for result in results:
59
+ # check whether score is higher than treshold
60
+ if result['score']<treshold:
61
+ continue
62
+ # Adjust indices by subtracting prompt length
63
+ start = result['start'] - prompt_length
64
+
65
+ # If indexes belongs to the prompt - continue
66
+ if start<0:
67
+ continue
68
+
69
+ end = result['end'] - prompt_length
70
+
71
+ # Extract span from original text using adjusted indices
72
+ span = text[start:end]
73
+
74
+ # Create processed result dict
75
+ processed_result = {
76
+ 'span': span,
77
+ 'start': start,
78
+ 'end': end,
79
+ 'score': result['score']
80
+ }
81
+
82
+ processed_results.append(processed_result)
83
+
84
+ return processed_results
85
+
86
  tokenizer = AutoTokenizer.from_pretrained("knowledgator/UTC-DeBERTa-small")
87
  model = AutoModelForTokenClassification.from_pretrained("knowledgator/UTC-DeBERTa-small")
88
 
 
101
  It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company's second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers.
102
  Apple went public in 1980 to instant financial success."""
103
 
104
+ results = process(text, prompt)
 
 
105
 
106
  print(results)
107
  ```
 
116
 
117
  input_ = f"{question} {text}"
118
 
119
+ results = process(text, question)
120
 
121
  print(results)
122
  ```
 
130
  Similarly, compound 23R showed dose-dependent stabilization of the SARS-CoV-2 Mpro in the thermal shift binding assay with an apparent Kd value of 9.43 μM, a 9.3-fold decrease compared to ML188 (1) (Figure B). In the enzymatic kinetic studies, 23R was shown to be a noncovalent inhibitor with a Ki value of 0.07 μM (Figure C, D top and middle panels). In comparison, the Ki for the parent compound ML188 (1) is 2.29 μM.
131
  The Lineweaver–Burk or double-reciprocal plot with different compound concentrations yielded an intercept at the Y-axis, suggesting that 23R is a competitive inhibitor similar to ML188 (1) (Figure C, D bottom panel). Buy our T-shirts for the lowerst prices you can find!!! Overall, the enzymatic kinetic studies confirmed that compound 23R is a noncovalent inhibitor of SARS-CoV-2 Mpro."""
132
 
133
+ results = process(text, prompt)
 
 
134
 
135
  print(results)
136
  ```
 
152
 
153
  prompt = rex_prompt.format(relation, entity)
154
 
155
+ results = process(text, prompt)
 
 
156
 
157
  print(results)
158
  ```
 
168
 
169
  prompt = ent_prompt.format(entity)
170
 
171
+ results = process(text, prompt)
 
 
172
 
173
  print(results)
174
  ```
 
182
  Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the world's largest technology company by revenue, with US$394.3 billion in 2022 revenue. As of March 2023, Apple is the world's biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer in the world. It is considered one of the Big Five American information technology companies, alongside Alphabet (parent company of Google), Amazon, Meta Platforms, and Microsoft.
183
  As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as "Wintel"). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple's unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the "Think different" campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company's product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook"""
184
 
185
+ results = process(text, prompt)
 
 
186
 
187
  print(results)
188
  ```