Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,25 @@ Coupled with the German Sauerkraut dataset, which consists of a mix of augmented
|
|
25 |
This was achieved *without the typical loss of core competencies often associated with fine-tuning in another language of models previously trained mainly in English.*
|
26 |
Our approach ensures that the model retains its original strengths while acquiring a profound understanding of German, **setting a new benchmark in bilingual language model proficiency.**
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## All HerO Models
|
30 |
|
@@ -34,26 +53,26 @@ Our approach ensures that the model retains its original strengths while acquiri
|
|
34 |
|
35 |
## Model Details
|
36 |
**SauerkrautLM-7b-HerO**
|
|
|
|
|
|
|
|
|
37 |
|
38 |
-
|
39 |
|
40 |
SauerkrautLM-7b-HerO was trained with mix of German data augmentation and translated data.
|
41 |
We found, that only a simple translation of training data can lead to unnatural German phrasings.
|
42 |
Data augmentation techniques were used to grant grammatical, syntactical correctness and a more natural German wording in our training data.
|
43 |
|
44 |
-
|
45 |
|
46 |
SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
|
47 |
The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
|
48 |
We applied the gradient SLURP method.
|
49 |
|
50 |
|
51 |
-
- **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
|
52 |
-
- **Language(s):** English, German
|
53 |
-
- **License:** APACHE 2.0
|
54 |
-
- **Contact:** [Website](https://vago-solutions.de/#Kontakt) [David Golchinfar](mailto:[email protected])
|
55 |
|
56 |
-
**Prompt Template:**
|
57 |
```
|
58 |
<|im_start|>system
|
59 |
Du bist Sauerkraut-HerO, ein großes Sprachmodell, das höflich und kompetent antwortet. Schreibe deine Gedanken Schritt für Schritt auf, um Probleme sinnvoll zu lösen.
|
@@ -67,7 +86,7 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
|
|
67 |
<|im_start|>assistant
|
68 |
```
|
69 |
## Evaluation
|
70 |
-
|
71 |
```
|
72 |
########## First turn ##########
|
73 |
score
|
@@ -126,7 +145,7 @@ SauerkrautLM-3b-v1 2.581250
|
|
126 |
open_llama_3b_v2 1.456250
|
127 |
Llama-2-7b 1.181250
|
128 |
```
|
129 |
-
|
130 |
```
|
131 |
########## First turn ##########
|
132 |
score
|
@@ -154,20 +173,20 @@ neural-chat-7b-v3-1 6.812500
|
|
154 |
```
|
155 |
|
156 |
|
157 |
-
|
158 |
Compared to Aleph Alpha Luminous Models:
|
159 |
![Harness](images/luminouscompare.PNG "SauerkrautLM-7b-HerO Harness")
|
160 |
|
161 |
*performed with newest Language Model Evaluation Harness
|
162 |
-
|
163 |
![BBH](images/bbh.PNG "SauerkrautLM-7b-HerO BBH")
|
164 |
*performed with newest Language Model Evaluation Harness
|
165 |
-
|
166 |
Compared to Aleph Alpha Luminous Models, LeoLM and EM_German:
|
167 |
![GPT4ALL diagram](images/gpt4alldiagram.PNG "SauerkrautLM-7b-HerO GPT4ALL Diagram")
|
168 |
|
169 |
![GPT4ALL table](images/gpt4alltable.PNG "SauerkrautLM-7b-HerO GPT4ALL Table")
|
170 |
-
|
171 |
![GermanBenchmarks](images/germanbench.PNG "SauerkrautLM-7b-HerO German Benchmarks")
|
172 |
*performed with newest Language Model Evaluation Harness
|
173 |
## Disclaimer
|
|
|
25 |
This was achieved *without the typical loss of core competencies often associated with fine-tuning in another language of models previously trained mainly in English.*
|
26 |
Our approach ensures that the model retains its original strengths while acquiring a profound understanding of German, **setting a new benchmark in bilingual language model proficiency.**
|
27 |
|
28 |
+
# Table of Contents
|
29 |
+
1. [Overview of all Her0 models](#all-hero-models)
|
30 |
+
2. [Model Details](#model-details)
|
31 |
+
-[Prompt template](#prompt-template)
|
32 |
+
-[Training Dataset](#training-dataset)
|
33 |
+
-[Merge Procedure](#merge-procedure)
|
34 |
+
3. [Evaluation](#evaluation)
|
35 |
+
- [MT-Bench (German)](#mt-bench-(german))
|
36 |
+
- [MT-Bench (English)](#mt-bench-(english))
|
37 |
+
- [Language Model evaluation Harness](#language-model-evaluation-harness)
|
38 |
+
- [BigBench](#BBH)
|
39 |
+
- [GPT4ALL](#gpt4all)
|
40 |
+
- [Additional German Benchmark results](#additional-german-benchmark-results)
|
41 |
+
- [GPT4ALL](#gpt4all)
|
42 |
+
4. [Disclaimer](#disclaimer)
|
43 |
+
5. [Contact](#contact)
|
44 |
+
7. [Collaborations](#collaborations)
|
45 |
+
8. [Acknowledgement](#acknowledgement)
|
46 |
+
|
47 |
|
48 |
## All HerO Models
|
49 |
|
|
|
53 |
|
54 |
## Model Details
|
55 |
**SauerkrautLM-7b-HerO**
|
56 |
+
- **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
|
57 |
+
- **Language(s):** English, German
|
58 |
+
- **License:** APACHE 2.0
|
59 |
+
- **Contact:** [Website](https://vago-solutions.de/#Kontakt) [David Golchinfar](mailto:[email protected])
|
60 |
|
61 |
+
#**Training Dataset:**
|
62 |
|
63 |
SauerkrautLM-7b-HerO was trained with mix of German data augmentation and translated data.
|
64 |
We found, that only a simple translation of training data can lead to unnatural German phrasings.
|
65 |
Data augmentation techniques were used to grant grammatical, syntactical correctness and a more natural German wording in our training data.
|
66 |
|
67 |
+
#**Merge Procedure:**
|
68 |
|
69 |
SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
|
70 |
The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
|
71 |
We applied the gradient SLURP method.
|
72 |
|
73 |
|
|
|
|
|
|
|
|
|
74 |
|
75 |
+
# **Prompt Template:**
|
76 |
```
|
77 |
<|im_start|>system
|
78 |
Du bist Sauerkraut-HerO, ein großes Sprachmodell, das höflich und kompetent antwortet. Schreibe deine Gedanken Schritt für Schritt auf, um Probleme sinnvoll zu lösen.
|
|
|
86 |
<|im_start|>assistant
|
87 |
```
|
88 |
## Evaluation
|
89 |
+
#**MT-Bench (German)**
|
90 |
```
|
91 |
########## First turn ##########
|
92 |
score
|
|
|
145 |
open_llama_3b_v2 1.456250
|
146 |
Llama-2-7b 1.181250
|
147 |
```
|
148 |
+
#**MT-Bench (English)**
|
149 |
```
|
150 |
########## First turn ##########
|
151 |
score
|
|
|
173 |
```
|
174 |
|
175 |
|
176 |
+
#**Language Model evaluation Harness**
|
177 |
Compared to Aleph Alpha Luminous Models:
|
178 |
![Harness](images/luminouscompare.PNG "SauerkrautLM-7b-HerO Harness")
|
179 |
|
180 |
*performed with newest Language Model Evaluation Harness
|
181 |
+
#**BBH**
|
182 |
![BBH](images/bbh.PNG "SauerkrautLM-7b-HerO BBH")
|
183 |
*performed with newest Language Model Evaluation Harness
|
184 |
+
#**GPT4ALL**
|
185 |
Compared to Aleph Alpha Luminous Models, LeoLM and EM_German:
|
186 |
![GPT4ALL diagram](images/gpt4alldiagram.PNG "SauerkrautLM-7b-HerO GPT4ALL Diagram")
|
187 |
|
188 |
![GPT4ALL table](images/gpt4alltable.PNG "SauerkrautLM-7b-HerO GPT4ALL Table")
|
189 |
+
#**Additional German Benchmark results**
|
190 |
![GermanBenchmarks](images/germanbench.PNG "SauerkrautLM-7b-HerO German Benchmarks")
|
191 |
*performed with newest Language Model Evaluation Harness
|
192 |
## Disclaimer
|