jrahn commited on
Commit
6d6296c
1 Parent(s): 7f55685

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -88,7 +88,13 @@ Use the code below to get started with the model.
88
 
89
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
90
 
91
- [More Information Needed]
 
 
 
 
 
 
92
 
93
  ### Training Procedure
94
 
@@ -107,6 +113,13 @@ Use the code below to get started with the model.
107
 
108
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
109
 
 
 
 
 
 
 
 
110
  [More Information Needed]
111
 
112
  ## Evaluation
 
88
 
89
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
90
 
91
+ Fineweb-Edu 10B + OpenHermes 2.5 (chatml)
92
+
93
+ Dataset proportions:
94
+ Part 1: FWE 4,836,050 + OH 100,000 (2.03%) = 4,936,050
95
+ Part 2: FWE 4,336,051 + OH 400,000 (8.45%) = 4,736,051
96
+ Part 3: FWE 500,000 + OH 501,551 (50.08%) = 1,001,551
97
+ Total documents: 10,669,024
98
 
99
  ### Training Procedure
100
 
 
113
 
114
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
115
 
116
+ Params: 355M -> Checkpoint: 700MB
117
+
118
+ Tokens: ~10B
119
+ Total training time: 30hrs
120
+ Hardware: 2x RTX4090
121
+ MFU: 71%
122
+
123
  [More Information Needed]
124
 
125
  ## Evaluation