Text Generation
Transformers
Safetensors
Finnish
llama
finnish
conversational
text-generation-inference
aapot commited on
Commit
665e556
1 Parent(s): cd0dc30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -17,7 +17,7 @@ pipeline_tag: text-generation
17
 
18
  # Ahma-3b for Finnish
19
 
20
- Ahma is 3B parameter decoder-only transformer model based on Llama architecture pretrained on Finnish language. Original Llama model architecture was introduced in
21
  [this paper](https://arxiv.org/abs/2302.13971)
22
  and first released at [this page](https://github.com/facebookresearch/llama).
23
 
@@ -26,7 +26,7 @@ What does Ahma mean? Ahma is the Finnish word for wolverine! In the Finnish Lapl
26
  There are two different sized Ahma models, all pretrained from scratch for 139B tokens:
27
 
28
  | Model | Context length | Layers | Dim | Heads | Params |
29
- |---------------------------------------------------------------------------------|----------------|--------|------|-------|--------|
30
  | [Ahma-3B](https://huggingface.co/Finnish-NLP/Ahma-3B) | 2048 | 26 | 3200 | 32 | 3.6B |
31
  | [Ahma-7B](https://huggingface.co/Finnish-NLP/Ahma-7B) | 2048 | 32 | 4096 | 32 | 7.0B |
32
 
@@ -36,7 +36,7 @@ This model was pretrained only in a self-supervised way, without any supervised
36
 
37
  ### How to use
38
 
39
- If you want to use this model for instruction-following, you need to use the same prompt format we used in the second stage of the pretraining (basically the same format what Meta used in their Llama2 models). Note: do not use "LlamaTokenizer" from transformers library but always use the AutoTokenizer instead, or use the plain sentencepiece tokenizer. Here is an example using the instruction-following prompt format, with some generation arguments you can modify for your use:
40
 
41
  ```python
42
  from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -122,7 +122,7 @@ The final training dataset had 23 billion words (calculated with regex "\w+") an
122
 
123
  The first stage:
124
  |Dataset | Words | Ratio |
125
- |------------------------------|-------------|--------------|
126
  |CulturaX | 12.820B | 59.88\% |
127
  |HPLT v1.2 | 5.034B | 23.51\% |
128
  |Suomi24 | 3.018B | 14.09\% |
@@ -135,7 +135,7 @@ The first stage:
135
 
136
  The second stage:
137
  |Dataset | Words | Ratio |
138
- |---------------------------------------------------------------|-------------|-------------|
139
  |CulturaX (cleaner sample using KenLM perplexity score) | 2.252B | 55.48\% |
140
  |Wikipedia | 0.095B | 2.34\% |
141
  |STT | 0.253B | 6.23\% |
@@ -179,7 +179,7 @@ Thanks to the WSD learning rate schedule, you can more easily experiment with di
179
  This Ahma model was primarily evaluated using [FIN-bench by TurkuNLP](https://github.com/TurkuNLP/FIN-bench), and the same evaluation was carried out for other relevant Finnish models for comparison. Below are the results with 0-shot and 3-shot settings in FIN-bench:
180
 
181
  | Benchmark | Ahma 3B (instruct prompt format) 0-shot | Ahma 7B (instruct prompt format) 0-shot | FinGPT 8B 0-shot | Viking 7B 0-shot | Poro 34B (8bit quant) 0-shot |
182
- |----------------------------|-----------------------------------------|-----------------------------------------|------------------|------------------|------------------------------|
183
  | Analogies | 50.77 | TBA | 49.23 | 40.00 | 54.62 |
184
  | Arithmetic | 27.64 | TBA | 33.15 | 30.16 | 30.34 |
185
  | Cause and Effect | 59.48 | TBA | 66.01 | 58.82 | 62.74 |
@@ -197,7 +197,7 @@ This Ahma model was primarily evaluated using [FIN-bench by TurkuNLP](https://gi
197
 
198
 
199
  | Benchmark | Ahma 3B (instruct prompt format) 3-shot | Ahma 7B (instruct prompt format) 3-shot | FinGPT 8B 3-shot | Viking 7B 3-shot | Poro 34B (8bit quant) 3-shot |
200
- |----------------------------|-----------------------------------------|-----------------------------------------|------------------|------------------|------------------------------|
201
  | Analogies | 52.31 | TBA | 40.77 | 54.62 | 76.92 |
202
  | Arithmetic | 44.59 | TBA | 43.63 | 45.78 | 53.68 |
203
  | Cause and Effect | 61.44 | TBA | 64.05 | 58.17 | 67.32 |
@@ -224,7 +224,7 @@ In a 3-shot setting, the results are more mixed. The poorer performance of Ahma
224
  This Ahma model was also evaluated using [MTBench Finnish by LumiOpen](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) even though this Ahma model is not fine-tuned for chat. Since the MTBench evaluates also multi-turn chats while Ahma models were only pretrained with single-turn instruction following examples, we have reported MTBench Finnish results separately for their single-turn and multi-turn evaluation examples. [Poro 34B Chat](https://huggingface.co/LumiOpen/Poro-34B-chat) model's results are copied from their model card for comparison.
225
 
226
  | Benchmark | Ahma 3B (instruct prompt format) single-turn | Ahma 3B (instruct prompt format) multi-turn | Ahma 7B (instruct prompt format) single-turn | Ahma 7B (instruct prompt format) multi-turn | Poro 34B Chat multi-turn |
227
- |---------------------|----------------------------------------------|---------------------------------------------|----------------------------------------------|---------------------------------------------|--------------------------|
228
  | Coding | 1.00 | 1.00 | TBA | TBA | 3.05 |
229
  | Extraction | 2.00 | 1.55 | TBA | TBA | 6.05 |
230
  | Humanities | 4.05 | 3.25 | TBA | TBA | 9.6 |
 
17
 
18
  # Ahma-3b for Finnish
19
 
20
+ Ahma is 3B parameter decoder-only transformer model based on Meta's Llama (v1) architecture pretrained on Finnish language. Original Llama model architecture was introduced in
21
  [this paper](https://arxiv.org/abs/2302.13971)
22
  and first released at [this page](https://github.com/facebookresearch/llama).
23
 
 
26
  There are two different sized Ahma models, all pretrained from scratch for 139B tokens:
27
 
28
  | Model | Context length | Layers | Dim | Heads | Params |
29
+ |:--------------------------------------------------------------------------------|:---------------|:-------|:-----|:------|:-------|
30
  | [Ahma-3B](https://huggingface.co/Finnish-NLP/Ahma-3B) | 2048 | 26 | 3200 | 32 | 3.6B |
31
  | [Ahma-7B](https://huggingface.co/Finnish-NLP/Ahma-7B) | 2048 | 32 | 4096 | 32 | 7.0B |
32
 
 
36
 
37
  ### How to use
38
 
39
+ If you want to use this model for instruction-following, you need to use the same prompt format we used in the second stage of the pretraining (basically the same format what Meta used in their Llama2 models). **Note: do not use "LlamaTokenizer" from transformers library but always use the AutoTokenizer instead, or use the plain sentencepiece tokenizer.** Here is an example using the instruction-following prompt format, with some generation arguments you can modify for your use:
40
 
41
  ```python
42
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
122
 
123
  The first stage:
124
  |Dataset | Words | Ratio |
125
+ |:-----------------------------|:------------|:-------------|
126
  |CulturaX | 12.820B | 59.88\% |
127
  |HPLT v1.2 | 5.034B | 23.51\% |
128
  |Suomi24 | 3.018B | 14.09\% |
 
135
 
136
  The second stage:
137
  |Dataset | Words | Ratio |
138
+ |:--------------------------------------------------------------|:------------|:------------|
139
  |CulturaX (cleaner sample using KenLM perplexity score) | 2.252B | 55.48\% |
140
  |Wikipedia | 0.095B | 2.34\% |
141
  |STT | 0.253B | 6.23\% |
 
179
  This Ahma model was primarily evaluated using [FIN-bench by TurkuNLP](https://github.com/TurkuNLP/FIN-bench), and the same evaluation was carried out for other relevant Finnish models for comparison. Below are the results with 0-shot and 3-shot settings in FIN-bench:
180
 
181
  | Benchmark | Ahma 3B (instruct prompt format) 0-shot | Ahma 7B (instruct prompt format) 0-shot | FinGPT 8B 0-shot | Viking 7B 0-shot | Poro 34B (8bit quant) 0-shot |
182
+ |:---------------------------|:----------------------------------------|:----------------------------------------|:-----------------|:-----------------|:-----------------------------|
183
  | Analogies | 50.77 | TBA | 49.23 | 40.00 | 54.62 |
184
  | Arithmetic | 27.64 | TBA | 33.15 | 30.16 | 30.34 |
185
  | Cause and Effect | 59.48 | TBA | 66.01 | 58.82 | 62.74 |
 
197
 
198
 
199
  | Benchmark | Ahma 3B (instruct prompt format) 3-shot | Ahma 7B (instruct prompt format) 3-shot | FinGPT 8B 3-shot | Viking 7B 3-shot | Poro 34B (8bit quant) 3-shot |
200
+ |:---------------------------|:----------------------------------------|:----------------------------------------|:-----------------|:-----------------|:-----------------------------|
201
  | Analogies | 52.31 | TBA | 40.77 | 54.62 | 76.92 |
202
  | Arithmetic | 44.59 | TBA | 43.63 | 45.78 | 53.68 |
203
  | Cause and Effect | 61.44 | TBA | 64.05 | 58.17 | 67.32 |
 
224
  This Ahma model was also evaluated using [MTBench Finnish by LumiOpen](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge) even though this Ahma model is not fine-tuned for chat. Since the MTBench evaluates also multi-turn chats while Ahma models were only pretrained with single-turn instruction following examples, we have reported MTBench Finnish results separately for their single-turn and multi-turn evaluation examples. [Poro 34B Chat](https://huggingface.co/LumiOpen/Poro-34B-chat) model's results are copied from their model card for comparison.
225
 
226
  | Benchmark | Ahma 3B (instruct prompt format) single-turn | Ahma 3B (instruct prompt format) multi-turn | Ahma 7B (instruct prompt format) single-turn | Ahma 7B (instruct prompt format) multi-turn | Poro 34B Chat multi-turn |
227
+ |:--------------------|:---------------------------------------------|:--------------------------------------------|:---------------------------------------------|:--------------------------------------------|:-------------------------|
228
  | Coding | 1.00 | 1.00 | TBA | TBA | 3.05 |
229
  | Extraction | 2.00 | 1.55 | TBA | TBA | 6.05 |
230
  | Humanities | 4.05 | 3.25 | TBA | TBA | 9.6 |