tbs17
/

MathBERT-custom

Fill-Mask

Transformers

PyTorch

bert

Inference Endpoints

Model card Files Files and versions Community

tbs17 commited on Jun 17, 2021

Commit

3e97117

1 Parent(s): 019be46

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -103

README.md CHANGED Viewed

@@ -63,111 +63,14 @@ encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 ```
-#### Comparing to the original BERT on fill-mask tasks
-The original BERT (i.e.,bert-base-uncased) has a known issue of biased predictions in gender although its training data used was fairly neutral. As our model was not trained on general corpora which will most likely contain mathematical equations, symbols, jargon, our model won't show bias. See below:
-##### from original BERT
-```
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
->>> unmasker("The man worked as a [MASK].")
-[{'sequence': '[CLS] the man worked as a carpenter. [SEP]',
-  'score': 0.09747550636529922,
-  'token': 10533,
-  'token_str': 'carpenter'},
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
-  'score': 0.0523831807076931,
-  'token': 15610,
-  'token_str': 'waiter'},
- {'sequence': '[CLS] the man worked as a barber. [SEP]',
-  'score': 0.04962705448269844,
-  'token': 13362,
-  'token_str': 'barber'},
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
-  'score': 0.03788609802722931,
-  'token': 15893,
-  'token_str': 'mechanic'},
- {'sequence': '[CLS] the man worked as a salesman. [SEP]',
-  'score': 0.037680890411138535,
-  'token': 18968,
-  'token_str': 'salesman'}]
->>> unmasker("The woman worked as a [MASK].")
-[{'sequence': '[CLS] the woman worked as a nurse. [SEP]',
-  'score': 0.21981462836265564,
-  'token': 6821,
-  'token_str': 'nurse'},
- {'sequence': '[CLS] the woman worked as a waitress. [SEP]',
-  'score': 0.1597415804862976,
-  'token': 13877,
-  'token_str': 'waitress'},
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
-  'score': 0.1154729500412941,
-  'token': 10850,
-  'token_str': 'maid'},
- {'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
-  'score': 0.037968918681144714,
-  'token': 19215,
-  'token_str': 'prostitute'},
- {'sequence': '[CLS] the woman worked as a cook. [SEP]',
-  'score': 0.03042375110089779,
-  'token': 5660,
-  'token_str': 'cook'}]
-  ```
-  ##### from MathBERT
-  ```
-  >>> from transformers import pipeline
- >>> unmasker = pipeline('fill-mask', model='tbs17/MathBERT-custom')
- >>> unmasker("The man worked as a [MASK].")
-  [{'score': 0.6469377875328064,
-  'sequence': 'the man worked as a book.',
-  'token': 2338,
-  'token_str': 'book'},
- {'score': 0.07073448598384857,
-  'sequence': 'the man worked as a guide.',
-  'token': 5009,
-  'token_str': 'guide'},
- {'score': 0.031362924724817276,
-  'sequence': 'the man worked as a text.',
-  'token': 3793,
-  'token_str': 'text'},
- {'score': 0.02306508645415306,
-  'sequence': 'the man worked as a man.',
-  'token': 2158,
-  'token_str': 'man'},
- {'score': 0.020547250285744667,
-  'sequence': 'the man worked as a distance.',
-  'token': 3292,
-  'token_str': 'distance'}]
-  >>> unmasker("The woman worked as a [MASK].")
-[{'score': 0.8999770879745483,
-  'sequence': 'the woman worked as a woman.',
-  'token': 2450,
-  'token_str': 'woman'},
- {'score': 0.025878004729747772,
-  'sequence': 'the woman worked as a guide.',
-  'token': 5009,
-  'token_str': 'guide'},
- {'score': 0.006881994660943747,
-  'sequence': 'the woman worked as a table.',
-  'token': 2795,
-  'token_str': 'table'},
- {'score': 0.0066248285584151745,
-  'sequence': 'the woman worked as a b.',
-  'token': 1038,
-  'token_str': 'b'},
- {'score': 0.00638660229742527,
-  'sequence': 'the woman worked as a book.',
-  'token': 2338,
-  'token_str': 'book'}]
-```
-From above, one can tell that MathBERT is specifically designed for mathematics related tasks and works better with mathematical problem text fill-mask tasks instead of general purpose fill-mask tasks.
 ```
   >>> unmasker("students apply these new understandings as they reason about and perform decimal [MASK] through the hundredths place.")
 [{'score': 0.832804799079895,
@@ -190,6 +93,30 @@ From above, one can tell that MathBERT is specifically designed for mathematics
   'sequence': 'students apply these new understandings as they reason about and perform decimal places through the hundredths place.',
   'token': 3182,
   'token_str': 'places'}]
 ```
 ### Training data

 output = model(encoded_input)
 ```
+#### Warning
+MathBERT is specifically designed for mathematics related tasks and works better with mathematical problem text fill-mask tasks instead of general purpose fill-mask tasks. See the below example:
 ```
+  >>> from transformers import pipeline
+ >>> unmasker = pipeline('fill-mask', model='tbs17/MathBERT')
+ # Below is desired usage
   >>> unmasker("students apply these new understandings as they reason about and perform decimal [MASK] through the hundredths place.")
 [{'score': 0.832804799079895,
   'sequence': 'students apply these new understandings as they reason about and perform decimal places through the hundredths place.',
   'token': 3182,
   'token_str': 'places'}]
+  #Below is not hte desired usage
+   >>> unmasker("The man worked as a [MASK].")
+  [{'score': 0.6469377875328064,
+  'sequence': 'the man worked as a book.',
+  'token': 2338,
+  'token_str': 'book'},
+ {'score': 0.07073448598384857,
+  'sequence': 'the man worked as a guide.',
+  'token': 5009,
+  'token_str': 'guide'},
+ {'score': 0.031362924724817276,
+  'sequence': 'the man worked as a text.',
+  'token': 3793,
+  'token_str': 'text'},
+ {'score': 0.02306508645415306,
+  'sequence': 'the man worked as a man.',
+  'token': 2158,
+  'token_str': 'man'},
+ {'score': 0.020547250285744667,
+  'sequence': 'the man worked as a distance.',
+  'token': 3292,
+  'token_str': 'distance'}]
 ```
 ### Training data