riccorl commited on
Commit
d5e9855
·
verified ·
1 Parent(s): e0de6a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -36,8 +36,8 @@ including Italian text.
36
  * [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
37
 
38
  ## Description
39
- This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on 2.2T billion tokens (1T billion in Italian,
40
- 1T billion in English, and 200 billion in code).
41
 
42
  This model is part of the Minerva LLM family:
43
 
@@ -97,8 +97,8 @@ output = pipeline(
97
 
98
  ## Model Architecture
99
 
100
- Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture, where the number of layers, number of heads, and the hidden states dimension are modified to reach 3B parameters.
101
- Please, take a look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
102
 
103
  The Minerva LLM family is composed of:
104
 
@@ -107,7 +107,7 @@ The Minerva LLM family is composed of:
107
  | Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
108
  | Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
109
  | Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
110
- | Minerva-7B-base-v1.0 | 2.2T (1T it + 1T en + 200B code) | 32 | 4096 | 32 | 8 | None | 4096 |
111
 
112
  ## Model Training
113
 
 
36
  * [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
37
 
38
  ## Description
39
+ This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on almost 2.5 trillion tokens (1.14 trillion in Italian,
40
+ 1.14 trillion in English, and 200 billion in code).
41
 
42
  This model is part of the Minerva LLM family:
43
 
 
97
 
98
  ## Model Architecture
99
 
100
+ Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture.
101
+ Please look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
102
 
103
  The Minerva LLM family is composed of:
104
 
 
107
  | Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
108
  | Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
109
  | Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
110
+ | Minerva-7B-base-v1.0 | 2.48T (1.14T it + 1.14T en + 200B code) | 32 | 4096 | 32 | 8 | None | 4096 |
111
 
112
  ## Model Training
113