sapienzanlp
/

Minerva-7B-base-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

riccorl commited on May 26, 2024

Commit

d5e9855

·

verified ·

1 Parent(s): e0de6a5

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -36,8 +36,8 @@ including Italian text.
 * [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
 ## Description
-This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on 2.2T billion tokens (1T billion in Italian,
-1T billion in English, and 200 billion in code).
 This model is part of the Minerva LLM family:
@@ -97,8 +97,8 @@ output = pipeline(
 ## Model Architecture
-Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture, where the number of layers, number of heads, and the hidden states dimension are modified to reach 3B parameters.
-Please, take a look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
 The Minerva LLM family is composed of:
@@ -107,7 +107,7 @@ The Minerva LLM family is composed of:
 | Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
 | Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
 | Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
-| Minerva-7B-base-v1.0 | 2.2T (1T it + 1T en + 200B code) | 32 | 4096 | 32 | 8 | None | 4096 |
 ## Model Training

 * [Minerva LLMs - website](https://nlp.uniroma1.it/minerva/)
 ## Description
+This is the model card for **Minerva-7B-base-v1.0**, a 7 billion parameter model trained on almost 2.5 trillion tokens (1.14 trillion in Italian,
+1.14 trillion in English, and 200 billion in code).
 This model is part of the Minerva LLM family:
 ## Model Architecture
+Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture.
+Please look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
 The Minerva LLM family is composed of:
 | Minerva-350M-base-v1.0 | 70B (35B it + 35B en) | 16 | 1152 | 16 | 4 | 2048 | 16384 |
 | Minerva-1B-base-v1.0 | 200B (100B it + 100B en) | 16 | 2048 | 16 | 4 | 2048 | 16384 |
 | Minerva-3B-base-v1.0 | 660B (330B it + 330B en) | 32 | 2560 | 32 | 8 | 2048 | 16384 |
+| Minerva-7B-base-v1.0 | 2.48T (1.14T it + 1.14T en + 200B code) | 32 | 4096 | 32 | 8 | None | 4096 |
 ## Model Training