Update README.md
Browse files
README.md
CHANGED
@@ -36,8 +36,8 @@ including Italian text.
|
|
36 |
* [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
|
37 |
|
38 |
## Description
|
39 |
-
This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on 2.
|
40 |
-
|
41 |
|
42 |
This model is part of the Minerva LLM family:
|
43 |
|
@@ -97,8 +97,8 @@ output = pipeline(
|
|
97 |
|
98 |
## Model Architecture
|
99 |
|
100 |
-
Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture
|
101 |
-
Please
|
102 |
|
103 |
The Minerva LLM family is composed of:
|
104 |
|
@@ -107,7 +107,7 @@ The Minerva LLM family is composed of:
|
|
107 |
| Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
|
108 |
| Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
|
109 |
| Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
|
110 |
-
| Minerva-7B-base-v1.0 | 2.
|
111 |
|
112 |
## Model Training
|
113 |
|
|
|
36 |
* [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
|
37 |
|
38 |
## Description
|
39 |
+
This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on almost 2.5 trillion tokens (1.14 trillion in Italian,
|
40 |
+
1.14 trillion in English, and 200 billion in code).
|
41 |
|
42 |
This model is part of the Minerva LLM family:
|
43 |
|
|
|
97 |
|
98 |
## Model Architecture
|
99 |
|
100 |
+
Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture.
|
101 |
+
Please look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
|
102 |
|
103 |
The Minerva LLM family is composed of:
|
104 |
|
|
|
107 |
| Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
|
108 |
| Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
|
109 |
| Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
|
110 |
+
| Minerva-7B-base-v1.0 | 2.48T (1.14T it + 1.14T en + 200B code) | 32 | 4096 | 32 | 8 | None | 4096 |
|
111 |
|
112 |
## Model Training
|
113 |
|