HuggingFaceTB
/

stack-edu-classifier-java

Text Classification

Generated from Trainer

Model card Files Files and versions

loubnabnl HF Staff commited on Feb 19

Commit

697da12

·

verified ·

1 Parent(s): 1395785

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -73,7 +73,15 @@ While the macro F1 scores across the 1-5 rating scale are relatively low due to
 <div style="display: flex; justify-content: center; gap: 20px;">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/06siAXeIRjJGnlPehW562.png" width="600">
 </div>
-We validated these classifiers by filtering Stack v2 data and testing on an intermediate SmolLM2 checkpoint. Filtering with a threshold of 3 improved performance across most languages while maintaining adequate data volume, though Java showed better results with a threshold of 2.
 ### Training hyperparameters

 <div style="display: flex; justify-content: center; gap: 20px;">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/06siAXeIRjJGnlPehW562.png" width="600">
 </div>
+The table below shows Stack-Edu dataset statistics and MultiPL-E scores for the top 4 (in terms of size) programming languages. We use HumanEval for Python evaluation. For the ablation, we started from a mid-training checkpoint of SmolLM2 at 3T tokens which was trained primarily on web data, and perform linear annealing on 200B tokens, uniformly distributed across 15 of the most commonly used programming languages (~14B tokens each).
+| Language   | Before filtering (B tokens) | After filtering (B tokens) | MultiPL-E (Original → Filtered) |
+|------------|-------------------------|---------------------|-------------------------------|
+| Python     | 50.6                    | 21.8                | 20.7 → 25.6                   |
+| C++        | 69.7                    | 16.0                | 16.7 → 24.8                   |
+| JavaScript | 45.3                    | 11.1                | 18.2 → 22.4                   |
+| Java       | 45.6                    | 42.1                | 17.6 → 22.7                   |
 ### Training hyperparameters