Update README.md
Browse files
README.md
CHANGED
@@ -16,12 +16,12 @@ language:
|
|
16 |
<img src="https://raw.githubusercontent.com/Pleias/logos/d6152d7943905da32a1e04fdfd7708ed9c7eed5e/PleIAs%201_0%20Full%20Logo%20(Black).png" style="width: 80%; margin: 0 auto; display: inline-block;"/>
|
17 |
</div>
|
18 |
|
19 |
-
**Pleias-nano-
|
20 |
|
21 |
-
Like all the base and specialized models from Pleias, Pleias-nano-
|
22 |
|
23 |
## Description
|
24 |
-
Pleias-nano-
|
25 |
|
26 |
It includes the following features, that would apply to any responsibly trained variant:
|
27 |
* Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content.
|
@@ -29,22 +29,22 @@ It includes the following features, that would apply to any responsibly trained
|
|
29 |
* A new tokenizer designed for enhanced document processing tasks and better multilingual support.
|
30 |
* Extremely low level of toxicity and problematic content.
|
31 |
|
32 |
-
Pleias-nano-
|
33 |
|
34 |
-
Given its size, Pleias-nano-
|
35 |
|
36 |
## Recommended use
|
37 |
-
As a base model, Pleias-nano-
|
38 |
|
39 |
Text generation is currently able to support a range of creative writing tasks in multiple European languages. For more consistent results we recommend using a low or null temperature with a slight repetition penalty (1.2).
|
40 |
|
41 |
-
Pleias-nano-
|
42 |
|
43 |
## Example
|
44 |
|
45 |
|
46 |
## Training
|
47 |
-
Pleias-nano-
|
48 |
|
49 |
Training schedule includes 518,000 steps (batch size 1,024) on over three epochs (nearly 5 trillions tokens):
|
50 |
* A lightly filtered version of Common Corpus (1.6 trillion tokens)
|
@@ -52,6 +52,6 @@ Training schedule includes 518,000 steps (batch size 1,024) on over three epochs
|
|
52 |
* A repeat of the previous set.
|
53 |
|
54 |
## Update
|
55 |
-
Pleias-nano-
|
56 |
|
57 |
The model will undergo several more round of post-training to enhance reasoning capacities and fine-tunability as well as in anticipation of a generalist instruct version.
|
|
|
16 |
<img src="https://raw.githubusercontent.com/Pleias/logos/d6152d7943905da32a1e04fdfd7708ed9c7eed5e/PleIAs%201_0%20Full%20Logo%20(Black).png" style="width: 80%; margin: 0 auto; display: inline-block;"/>
|
17 |
</div>
|
18 |
|
19 |
+
**Pleias-nano-1.2b-Preview** is an early preview of a 1.21 billion parameters base model trained by [Pleias](https://huggingface.co/PleIAs) with [Tracto AI](https://tracto.ai/) on [Common Corpus](https://huggingface.co/datasets/PleIAs/common_corpus).
|
20 |
|
21 |
+
Like all the base and specialized models from Pleias, Pleias-nano-1.2b-Preview has only been trained on open data out of copyright (public domain) or under a permissible license.
|
22 |
|
23 |
## Description
|
24 |
+
Pleias-nano-1.2b-Preview is a transformer base model, entirely pretrained from scratch, using an architecture similar to Llama/GPT-Neox for easier deployment/inference.
|
25 |
|
26 |
It includes the following features, that would apply to any responsibly trained variant:
|
27 |
* Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content.
|
|
|
29 |
* A new tokenizer designed for enhanced document processing tasks and better multilingual support.
|
30 |
* Extremely low level of toxicity and problematic content.
|
31 |
|
32 |
+
Pleias-nano-1.2b-Preview has demonstrated unusual abilities for multilingual generation in its size range. Fully supported languages include English, French, Spanish, German, Italian, Dutch, Latin and Portuguese.
|
33 |
|
34 |
+
Given its size, Pleias-nano-1.2b-Preview can run on CPU without any compression loss. We provide a first GGUF variant as part of our release.
|
35 |
|
36 |
## Recommended use
|
37 |
+
As a base model, Pleias-nano-1.2b-Preview is only able to run continuation prompts.
|
38 |
|
39 |
Text generation is currently able to support a range of creative writing tasks in multiple European languages. For more consistent results we recommend using a low or null temperature with a slight repetition penalty (1.2).
|
40 |
|
41 |
+
Pleias-nano-1.2b-Preview has been successfully adapted for continuous pretraining and full-fine-tuning on document processing tasks such as RAG, translation or OCR correction. Given the small size of the model we do not recommend fine-tuning methods based on LORA.
|
42 |
|
43 |
## Example
|
44 |
|
45 |
|
46 |
## Training
|
47 |
+
Pleias-nano-1.2b-Preview was fully pretrained on TractoAI on ISEG GPU cluster by Nebius AI on 192 h100s for 5 days. Pretraining code relied on [the fork of Nanotron developed by TractoAI](https://github.com/tractoai/nanotron). We provide the complete settings as a yaml file as part of our release.
|
48 |
|
49 |
Training schedule includes 518,000 steps (batch size 1,024) on over three epochs (nearly 5 trillions tokens):
|
50 |
* A lightly filtered version of Common Corpus (1.6 trillion tokens)
|
|
|
52 |
* A repeat of the previous set.
|
53 |
|
54 |
## Update
|
55 |
+
Pleias-nano-1.2b-Preview is currently released as an early preview.
|
56 |
|
57 |
The model will undergo several more round of post-training to enhance reasoning capacities and fine-tunability as well as in anticipation of a generalist instruct version.
|