littleworth
/

protgpt2-distilled-tiny

Text Generation

text-generation-inference

Model card Files Files and versions

littleworth commited on May 7, 2024

Commit

b66c778

·

verified ·

1 Parent(s): 117d1f4

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -15,10 +15,13 @@ This model card describes the distilled version of ProtGPT2, referred to as `pro
 **Dataset Used:**
 - The model was distilled using a subset of the evaluation dataset provided by `nferruz/UR50_2021_04`.
-**Loss Formulation:**
-- **Soft Loss:** \( L_{soft} = \text{KL}(\text{softmax}(\frac{s}{T}), \text{softmax}(\frac{t}{T})) \)
-- **Hard Loss:** \( L_{hard} = -\sum_{i} y_i \log(\text{softmax}(s_i)) \)
-- **Combined Loss:** \( L = \alpha L_{hard} + (1 - \alpha) L_{soft} \)
 ### Performance
 The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improvement in inference speed—up to 6 times faster than the pretrained version—while maintaining comparable perplexities.

 **Dataset Used:**
 - The model was distilled using a subset of the evaluation dataset provided by `nferruz/UR50_2021_04`.
+<strong>Loss Formulation:</strong>
+<ul>
+    <li><strong>Soft Loss:</strong> <span>&#x2112;<sub>soft</sub> = KL(softmax(s/T), softmax(t/T))</span></li>
+    <li><strong>Hard Loss:</strong> <span>&#x2112;<sub>hard</sub> = -∑<sub>i</sub> y<sub>i</sub> log(softmax(s<sub>i</sub>))</span></li>
+    <li><strong>Combined Loss:</strong> <span>&#x2112; = α &#x2112;<sub>hard</sub> + (1 - α) &#x2112;<sub>soft</sub></span></li>
+</ul>
 ### Performance
 The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improvement in inference speed—up to 6 times faster than the pretrained version—while maintaining comparable perplexities.