jrahn
/

gpt2_350M_edu_hermes

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jrahn commited on Jul 24

Commit

6d6296c

•

1 Parent(s): 7f55685

Update README.md

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -88,7 +88,13 @@ Use the code below to get started with the model.
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
@@ -107,6 +113,13 @@ Use the code below to get started with the model.
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 [More Information Needed]
 ## Evaluation

 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+Fineweb-Edu 10B + OpenHermes 2.5 (chatml)
+Dataset proportions:
+Part 1: FWE 4,836,050 + OH 100,000 (2.03%) = 4,936,050
+Part 2: FWE 4,336,051 + OH 400,000 (8.45%) = 4,736,051
+Part 3: FWE 500,000 + OH 501,551 (50.08%) = 1,001,551
+Total documents: 10,669,024
 ### Training Procedure
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+Params: 355M -> Checkpoint: 700MB
+Tokens: ~10B
+Total training time: 30hrs
+Hardware: 2x RTX4090
+MFU: 71%
 [More Information Needed]
 ## Evaluation