littleworth commited on
Commit
64e5515
1 Parent(s): 201dd74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -25,10 +25,11 @@ This model card describes the distilled version of [ProtGPT2](https://huggingfac
25
 
26
  <strong>Loss Formulation:</strong>
27
  <ul>
28
- <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span></li>
29
- <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span></li>
30
- <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span></li>
31
  </ul>
 
32
 
33
 
34
  ### Performance
 
25
 
26
  <strong>Loss Formulation:</strong>
27
  <ul>
28
+ <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span>, where <em>s</em> are the logits from the student model, <em>t</em> are the logits from the teacher model, and <em>T</em> is the temperature used to soften the probabilities.</li>
29
+ <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span>, where <em>y<sub>i</sub></em> represents the true labels, and <em>s<sub>i</sub></em> are the logits from the student model corresponding to each label.</li>
30
+ <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span>, where <em>α</em> (alpha) is the weight factor that balances the hard loss and soft loss.</li>
31
  </ul>
32
+ <p><strong>Note:</strong> KL represents the Kullback-Leibler divergence, a measure used to quantify how one probability distribution diverges from a second, expected probability distribution.</p>
33
 
34
 
35
  ### Performance