littleworth
commited on
Commit
•
2f153c0
1
Parent(s):
505fdf5
Update README.md
Browse files
README.md
CHANGED
@@ -25,10 +25,12 @@ This model card describes the distilled version of [ProtGPT2](https://huggingfac
|
|
25 |
|
26 |
<strong>Loss Formulation:</strong>
|
27 |
<ul>
|
28 |
-
<li><strong>Soft Loss:</strong> <span>ℒ<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span
|
29 |
-
<li><strong>Hard Loss:</strong> <span>ℒ<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span></li>
|
30 |
-
<li><strong>Combined Loss:</strong> <span>ℒ = α ℒ<sub>hard</sub> + (1 - α) ℒ<sub>soft</sub></span
|
31 |
</ul>
|
|
|
|
|
32 |
|
33 |
|
34 |
### Performance
|
|
|
25 |
|
26 |
<strong>Loss Formulation:</strong>
|
27 |
<ul>
|
28 |
+
<li><strong>Soft Loss:</strong> <span>ℒ<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span>, where <em>s</em> are the logits from the student model, <em>t</em> are the logits from the teacher model, and <em>T</em> is the temperature used to soften the probabilities.</li>
|
29 |
+
<li><strong>Hard Loss:</strong> <span>ℒ<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span>, where <em>y<sub>i</sub></em> represents the true labels, and <em>s<sub>i</sub></em> are the logits from the student model corresponding to each label.</li>
|
30 |
+
<li><strong>Combined Loss:</strong> <span>ℒ = α ℒ<sub>hard</sub> + (1 - α) ℒ<sub>soft</sub></span>, where <em>α</em> (alpha) is the weight factor that balances the hard loss and soft loss.</li>
|
31 |
</ul>
|
32 |
+
<p><strong>Note:</strong> KL represents the Kullback-Leibler divergence, a measure used to quantify how one probability distribution diverges from a second, expected probability distribution.</p>
|
33 |
+
|
34 |
|
35 |
|
36 |
### Performance
|