MichelBartelsDeepset
commited on
Commit
•
6c62aa0
1
Parent(s):
8084f28
Update README.md
Browse files
README.md
CHANGED
@@ -11,13 +11,13 @@ tags:
|
|
11 |
## Overview
|
12 |
**Language model:** deepset/tinybert-6L-768D-squad2
|
13 |
**Language:** English
|
14 |
-
**Training data:** SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set
|
15 |
**Eval data:** SQuAD 2.0 dev set
|
16 |
**Infrastructure**: 1x V100 GPU
|
17 |
**Published**: Dec 8th, 2021
|
18 |
|
19 |
## Details
|
20 |
-
- haystack's intermediate layer and prediction layer distillation features were used for training (based on [TinyBERT](https://arxiv.org/pdf/1909.10351.pdf)). deepset/bert-base-uncased-squad2 was used as the teacher model.
|
21 |
|
22 |
## Hyperparameters
|
23 |
### Intermediate layer distillation
|
@@ -29,7 +29,6 @@ learning_rate = 5e-5
|
|
29 |
lr_schedule = LinearWarmup
|
30 |
embeds_dropout_prob = 0.1
|
31 |
temperature = 1
|
32 |
-
distillation_loss_weight = 0.75
|
33 |
```
|
34 |
### Prediction layer distillation
|
35 |
```
|
@@ -40,7 +39,7 @@ learning_rate = 3e-5
|
|
40 |
lr_schedule = LinearWarmup
|
41 |
embeds_dropout_prob = 0.1
|
42 |
temperature = 1
|
43 |
-
distillation_loss_weight = 0
|
44 |
```
|
45 |
## Performance
|
46 |
```
|
|
|
11 |
## Overview
|
12 |
**Language model:** deepset/tinybert-6L-768D-squad2
|
13 |
**Language:** English
|
14 |
+
**Training data:** SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set without augmentation
|
15 |
**Eval data:** SQuAD 2.0 dev set
|
16 |
**Infrastructure**: 1x V100 GPU
|
17 |
**Published**: Dec 8th, 2021
|
18 |
|
19 |
## Details
|
20 |
+
- haystack's intermediate layer and prediction layer distillation features were used for training (based on [TinyBERT](https://arxiv.org/pdf/1909.10351.pdf)). deepset/bert-base-uncased-squad2 was used as the teacher model and huawei-noah/TinyBERT_General_6L_768D was used as the student model.
|
21 |
|
22 |
## Hyperparameters
|
23 |
### Intermediate layer distillation
|
|
|
29 |
lr_schedule = LinearWarmup
|
30 |
embeds_dropout_prob = 0.1
|
31 |
temperature = 1
|
|
|
32 |
```
|
33 |
### Prediction layer distillation
|
34 |
```
|
|
|
39 |
lr_schedule = LinearWarmup
|
40 |
embeds_dropout_prob = 0.1
|
41 |
temperature = 1
|
42 |
+
distillation_loss_weight = 1.0
|
43 |
```
|
44 |
## Performance
|
45 |
```
|