joanllop commited on
Commit
6c89d92
·
verified ·
1 Parent(s): eab1da2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -15
README.md CHANGED
@@ -48,16 +48,14 @@ language:
48
  >
49
  > The weights will be promptly updated as soon as the training process is complete.
50
 
51
- # Salamandra ALIA Model Card
52
 
53
- Salamandra ALIA is a highly multilingual model pre-trained from scratch that comes in three different
54
- sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants.
55
- This model card corresponds to the 40B base version.
56
 
57
- To visit the model cards of other Salamandra ALIA versions, please refer to the [Model Index](#model-index).
58
 
59
- The entire Salamandra ALIA family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).
60
- Along with the open weights, all training scripts and configuration files are made publicly available in [this GitHub repository](https://github.com/langtech-bsc/salamandra).
61
 
62
  ---
63
 
@@ -70,7 +68,7 @@ The pre-training corpus contains text in 35 European languages and code.
70
 
71
  ### Hyperparameters
72
 
73
- The full list of hyperparameters for each model can be found [here](https://github.com/langtech-bsc/salamandra/tree/main/configs).
74
 
75
  ### Architecture
76
 
@@ -145,7 +143,7 @@ This section offers examples of how to perform inference using various methods.
145
  You'll find different techniques for running inference, including Huggingface's Text Generation Pipeline, multi-GPU configurations, and vLLM for scalable and efficient generation.
146
 
147
  #### Inference with Huggingface's Text Generation Pipeline
148
- The Huggingface Text Generation Pipeline provides a straightforward way to run inference using the Salamandra-40b model.
149
 
150
  ```bash
151
  pip install transformers torch accelerate sentencepiece protobuf
@@ -156,7 +154,7 @@ pip install transformers torch accelerate sentencepiece protobuf
156
  ```python
157
  from transformers import pipeline, set_seed
158
 
159
- model_id = "BSC-LT/salamandra-40b"
160
 
161
  # Sample prompts
162
  prompts = [
@@ -202,7 +200,7 @@ pip install transformers torch accelerate sentencepiece protobuf
202
  from transformers import AutoTokenizer, AutoModelForCausalLM
203
  import torch
204
 
205
- model_id = "BSC-LT/salamandra-40b"
206
 
207
  # Input text
208
  text = "El mercat del barri és"
@@ -246,7 +244,7 @@ pip install vllm
246
  ```python
247
  from vllm import LLM, SamplingParams
248
 
249
- model_id = "BSC-LT/salamandra-40b"
250
 
251
  # Sample prompts
252
  prompts = [
@@ -444,7 +442,7 @@ We provide an extense Datasheet section following the best practices defined by
444
 
445
  **For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.**
446
 
447
- The purpose of creating this dataset is to pre-train the Salamandra family of multilingual models with high performance in a large number of
448
  European languages (35) and code (including 92 different programming languages). In addition, we aim to represent especially the co-official
449
  languages of Spain: Spanish, Catalan, Galician, and Basque. This is the reason why we carry out an oversampling of these languages.
450
 
@@ -630,7 +628,7 @@ and the [Ungoliant](https://github.com/oscar-project/ungoliant) pipeline was use
630
 
631
  **Has the dataset been used for any tasks already? If so, please provide a description.**
632
 
633
- Pre-train the Salamandra model family.
634
 
635
  **What (other) tasks could the dataset be used for?**
636
 
@@ -1066,4 +1064,4 @@ Technical report coming soon.
1066
  |:---:|:---:|:---:|
1067
  |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
1068
  |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
1069
- |40B| [Link](https://huggingface.co/BSC-LT/salamandra-40b) | WiP |
 
48
  >
49
  > The weights will be promptly updated as soon as the training process is complete.
50
 
51
+ # ALIA-40b Model Card
52
 
53
+ ALIA-40b is a highly multilingual model pre-trained from scratch that will come with its respective base and instruction-tuned variants.
 
 
54
 
55
+ To visit the model cards of other ALIA versions, please refer to the [Model Index](#model-index).
56
 
57
+ This model is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).
58
+ Along with the open weights, all training scripts and configuration files are made publicly available in [this GitHub repository](https://github.com/langtech-bsc/alia).
59
 
60
  ---
61
 
 
68
 
69
  ### Hyperparameters
70
 
71
+ The full list of hyperparameters can be found [here](https://github.com/langtech-bsc/alia/blob/main/configs/bsc_40b.yaml).
72
 
73
  ### Architecture
74
 
 
143
  You'll find different techniques for running inference, including Huggingface's Text Generation Pipeline, multi-GPU configurations, and vLLM for scalable and efficient generation.
144
 
145
  #### Inference with Huggingface's Text Generation Pipeline
146
+ The Huggingface Text Generation Pipeline provides a straightforward way to run inference using the ALIA-40b model.
147
 
148
  ```bash
149
  pip install transformers torch accelerate sentencepiece protobuf
 
154
  ```python
155
  from transformers import pipeline, set_seed
156
 
157
+ model_id = "BSC-LT/ALIA-40b"
158
 
159
  # Sample prompts
160
  prompts = [
 
200
  from transformers import AutoTokenizer, AutoModelForCausalLM
201
  import torch
202
 
203
+ model_id = "BSC-LT/ALIA-40b"
204
 
205
  # Input text
206
  text = "El mercat del barri és"
 
244
  ```python
245
  from vllm import LLM, SamplingParams
246
 
247
+ model_id = "BSC-LT/ALIA-40b"
248
 
249
  # Sample prompts
250
  prompts = [
 
442
 
443
  **For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.**
444
 
445
+ The purpose of creating this dataset is to pre-train a family of multilingual models with high performance in a large number of
446
  European languages (35) and code (including 92 different programming languages). In addition, we aim to represent especially the co-official
447
  languages of Spain: Spanish, Catalan, Galician, and Basque. This is the reason why we carry out an oversampling of these languages.
448
 
 
628
 
629
  **Has the dataset been used for any tasks already? If so, please provide a description.**
630
 
631
+ Pre-train the ALIA model and the Salamandra model family.
632
 
633
  **What (other) tasks could the dataset be used for?**
634
 
 
1064
  |:---:|:---:|:---:|
1065
  |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
1066
  |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
1067
+ |40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |