littleworth
/

protgpt2-distilled-tiny

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

littleworth commited on May 7

Commit

2f153c0

•

1 Parent(s): 505fdf5

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -25,10 +25,12 @@ This model card describes the distilled version of [ProtGPT2](https://huggingfac
 <strong>Loss Formulation:</strong>
 <ul>
-    <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span></li>
-    <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span></li>
-    <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span></li>
 </ul>
 ### Performance

 <strong>Loss Formulation:</strong>
 <ul>
+    <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span>, where <em>s</em> are the logits from the student model, <em>t</em> are the logits from the teacher model, and <em>T</em> is the temperature used to soften the probabilities.</li>
+    <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span>, where <em>y<sub>i</sub></em> represents the true labels, and <em>s<sub>i</sub></em> are the logits from the student model corresponding to each label.</li>
+    <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span>, where <em>α</em> (alpha) is the weight factor that balances the hard loss and soft loss.</li>
 </ul>
+<p><strong>Note:</strong> KL represents the Kullback-Leibler divergence, a measure used to quantify how one probability distribution diverges from a second, expected probability distribution.</p>
 ### Performance