Update README.md
Browse files
README.md
CHANGED
@@ -88,7 +88,13 @@ Use the code below to get started with the model.
|
|
88 |
|
89 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
90 |
|
91 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
|
93 |
### Training Procedure
|
94 |
|
@@ -107,6 +113,13 @@ Use the code below to get started with the model.
|
|
107 |
|
108 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
109 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
[More Information Needed]
|
111 |
|
112 |
## Evaluation
|
|
|
88 |
|
89 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
90 |
|
91 |
+
Fineweb-Edu 10B + OpenHermes 2.5 (chatml)
|
92 |
+
|
93 |
+
Dataset proportions:
|
94 |
+
Part 1: FWE 4,836,050 + OH 100,000 (2.03%) = 4,936,050
|
95 |
+
Part 2: FWE 4,336,051 + OH 400,000 (8.45%) = 4,736,051
|
96 |
+
Part 3: FWE 500,000 + OH 501,551 (50.08%) = 1,001,551
|
97 |
+
Total documents: 10,669,024
|
98 |
|
99 |
### Training Procedure
|
100 |
|
|
|
113 |
|
114 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
115 |
|
116 |
+
Params: 355M -> Checkpoint: 700MB
|
117 |
+
|
118 |
+
Tokens: ~10B
|
119 |
+
Total training time: 30hrs
|
120 |
+
Hardware: 2x RTX4090
|
121 |
+
MFU: 71%
|
122 |
+
|
123 |
[More Information Needed]
|
124 |
|
125 |
## Evaluation
|