timpal0l commited on
Commit
2381ce9
·
1 Parent(s): 1b01058

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -107,7 +107,31 @@ license: mit
107
  ## This model can be used for Extractive QA
108
  It has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).
109
 
110
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
112
 
113
  [DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
 
107
  ## This model can be used for Extractive QA
108
  It has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).
109
 
110
+ ## Evaluation on SQuAD2.0 dev set
111
+ ```
112
+ {
113
+ "epoch": 3.0,
114
+ "eval_HasAns_exact": 79.65587044534414,
115
+ "eval_HasAns_f1": 85.91387795001529,
116
+ "eval_HasAns_total": 5928,
117
+ "eval_NoAns_exact": 82.10260723296888,
118
+ "eval_NoAns_f1": 82.10260723296888,
119
+ "eval_NoAns_total": 5945,
120
+ "eval_best_exact": 80.8809904826076,
121
+ "eval_best_exact_thresh": 0.0,
122
+ "eval_best_f1": 84.00551406448994,
123
+ "eval_best_f1_thresh": 0.0,
124
+ "eval_exact": 80.8809904826076,
125
+ "eval_f1": 84.00551406449004,
126
+ "eval_samples": 12508,
127
+ "eval_total": 11873,
128
+ "train_loss": 0.7729689576483615,
129
+ "train_runtime": 9118.953,
130
+ "train_samples": 134891,
131
+ "train_samples_per_second": 44.377,
132
+ "train_steps_per_second": 0.925
133
+ }
134
+ ```
135
  ## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
136
 
137
  [DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.