fix formatting
Browse files
README.md
CHANGED
@@ -14,25 +14,25 @@ tags:
|
|
14 |
|
15 |
# TamiLM Hebrew Nano
|
16 |
|
17 |
-
A Modern Hebrew specialized LLM based on the RWKVv6 Architecture
|
18 |
-
Trained only on Modern Hebrew datasets, with a custom vocabulary optimized for Modern Hebrew
|
19 |
|
20 |
Trained at [Tel Aviv Makers Hackerspace](https://wiki.telavivmakers.org/)
|
21 |
|
22 |
### Params
|
23 |
|
24 |
-
Layers
|
25 |
-
Depth
|
26 |
-
Head size
|
27 |
-
Train ctx_len
|
28 |
-
Train tokens
|
29 |
-
Vocab size
|
30 |
|
31 |
### Train Compute
|
32 |
|
33 |
-
All compute was performed on a single Nvidia P40 card
|
34 |
-
Experiments: 62 hours 52 Minutes
|
35 |
-
Training run: 208 hours 10 Minutes
|
36 |
|
37 |
### How to run
|
38 |
|
|
|
14 |
|
15 |
# TamiLM Hebrew Nano
|
16 |
|
17 |
+
A Modern Hebrew specialized LLM based on the RWKVv6 Architecture
|
18 |
+
Trained only on Modern Hebrew datasets, with a custom vocabulary optimized for Modern Hebrew
|
19 |
|
20 |
Trained at [Tel Aviv Makers Hackerspace](https://wiki.telavivmakers.org/)
|
21 |
|
22 |
### Params
|
23 |
|
24 |
+
Layers `12`
|
25 |
+
Depth `512`
|
26 |
+
Head size `64`
|
27 |
+
Train ctx_len `512`
|
28 |
+
Train tokens `6,841,411,389 (6 Billion)`
|
29 |
+
Vocab size `65536`
|
30 |
|
31 |
### Train Compute
|
32 |
|
33 |
+
All compute was performed on a single Nvidia P40 card
|
34 |
+
Experiments: `62 hours 52 Minutes (2.6 days)`
|
35 |
+
Training run: `208 hours 10 Minutes (8.6 days)`
|
36 |
|
37 |
### How to run
|
38 |
|