File size: 6,618 Bytes
cc8a749
b72be0a
 
 
 
 
cc8a749
b72be0a
 
 
cc8a749
b72be0a
 
 
 
 
 
 
cc8a749
 
b72be0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9b8715
 
b72be0a
 
 
 
b9b8715
 
 
 
 
 
 
 
 
 
b72be0a
 
 
 
 
 
 
 
 
 
 
 
d1f1b61
8531deb
b9b8715
8531deb
b72be0a
 
 
b9b8715
cc8a749
8531deb
 
b72be0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc8a749
 
b72be0a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
base_model: google/gemma-2-2b-jpn-it
language:
- multilingual
datasets:
  - mlabonne/orpo-dpo-mix-40k
library_name: transformers
license: gemma
license_link: https://ai.google.dev/gemma/terms
pipeline_tag: text-generation
tags:
- nlp
- code
quantized_by: ymcki
widget:
- messages:
  - role: user
    content: Can you provide ways to eat combinations of bananas and dragonfruits?
---

Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

## Prompt format

```
<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
<end_of_turn>
<start_of_turn>model

```

Note that this model does not support a System prompt.

Since [gemma-2-2b-jpn-it-ablitered-18](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-abliterated-18) is slightly brain damaged compare to the original [gemma-2-2b-jpn-it](https://huggingface.co/google/gemma-2-2b-jpn-it). I decided to try ORPO fine tuning to see if it can be headled.

Using the [gemma-2-2b base model](https://huggingface.co/google/gemma-2-2b), I employed the ORPO method described by [mlabonne](https://towardsdatascience.com/fine-tune-llama-3-with-orpo-56cfab2f9ada) but the input model was read into VRAM by [unsloth](https://github.com/unslothai/unsloth) to allow using the full 40k dataset to run on a single 3090.

Ten epoches was run. Smallest eval_loss was achieve at epoch 7.00. 
Checkpoint at epoch 7.00 is used to obtain a model adapter and
applied it to [gemma-2-2b-jpn-it-ablitered-18](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-abliterated-18) to obtain [gemma-2-2b-ORPO-jpn-it-ablitered-18](https://huggingface.co/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18).

| Epoch | loss | eval_loss | eval_logps/rejected | eval_logps/chosen |
| ----- | ---- | --------- | ------------------- | ----------------- |
| 1.00 | 0.9754 | 1.0344 | -1.1506 | -0.7516 |
| 2.00 | 0.9629 | 1.0173 | -1.2694 | -0.7351 |
| 3.00 | 0.7435 | 1.0087 | -1.4922 | -0.7388 |
| 4.00 | 1.0595 | 1.0026 | -1.5920 | -0.7310 |
| 5.00 | 1.0525 | 1.0000 | -1.6313 | -0.7311 |
| 6.00 | 1.1628 | 1.0014 | -1.7263 | -0.7393 |
| 7.00 | 0.8994 | 0.9971 | -1.7264 | -0.7324 |
| 8.00 | 0.7448 | 1.0056 | -1.7790 | -0.7482 |
| 9.00 | 0.6801 | 1.0028 | -1.7794 | -0.7429 |
| 10.00 | 0.9868 | 1.0069 | -1.8065 | -0.7505 |

Then I followed Rombodawg's [suggestion](https://www.reddit.com/r/LocalLLaMA/comments/1fyx27y/im_pretty_happy_with_how_my_method_worked_out/) to merge [gemma-2-2b](https://huggingface.co/google/gemma-2-2b), [gemma-2-2b-ORPO-jpn-it-ablitered-18](https://huggingface.co/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18) and [gemma-2-2b-jpn-it-ablitered-18](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-abliterated-18) to obtain this model.

This model is uploaded here to be evaluated by the Open LLM Leaderboard. Further ORPO fine tuning is currently underway to see if it can regain its sanity. You can play with this model first or wait until I am done with the fine tuning.

## Benchmark (100.0*raw scores only)

Click on the model name go to the raw score json generated by Open LLM Leaderboard.

| Model | Average | IFEval | BHH | Math Lv5 | GPQA | MUSR | MMLU-PRO |
| ----- | ------- | ------ | ----|--------- | ---- | ---- | -------- |
| [gemma-2-2b-jpn-it](https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/google/gemma-2-2b-jpn-it/results_2024-10-15T15-21-39.173019.json) | 30.82 | 54.11 | 41.43 | 0.0 | 27.52 | 37.17 | 24.67 |
| [gemma-2-2b-ORPO-jpn-it-abliterated-18-merge (5 epoches)](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge/results_2024-10-30T17-06-58.119904.json) | 29.26 | 49.16 | 38.15 | 2.49 | 28.19 | 33.07 | 24.51 |
| [gemma-2-2b-ORPO-jpn-it-abliterated-18-merge (10 epoches)](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge/results_2024-11-18T07-53-54.972969.json) | 30.65 | 53.81 | 41.21 | 0.83 | 28.36 | 35.05 | 24.61 |
| [gemma-2-2b-ORPO-jpn-it-abliterated-18 (5 epoches)](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18/results_2024-10-30T22-19-29.202883.json) | 29.57 | 48.05 | 41.26 | 0.0 | 27.18 | 36.51 | 24.43 |
| [gemma-2-2b-ORPO-jpn-it-abliterated-18 (10 epoches)](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18/results_2024-11-18T08-02-58.149334.json) | 29.68 | 47.76 | 40.20 | 0.38 | 28.86 | 37.43 | 23.45 |
| [gemma-2-2b-jpn-it-abliterated-17](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-17/results_2024-10-18T15-18-46.821674.json) | 30.29 | 52.65 | 40.46 | 0.0 | 27.18 | 36.90 | 24.55 |
| [gemma-2-2b-jpn-it-abliterated-18](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-18/results_2024-10-18T15-41-42.399571.json) | 30.61 | 53.02 | 40.96 | 0.0 | 27.35 | 37.30 | 25.05 |
| [gemma-2-2b-jpn-it-abliterated-24](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-24/results_2024-10-25T16-29-46.542899.json) | 30.61 | 51.37 | 40.77 | 0.0 | 27.77 | 39.02 | 24.73 |
| [gemma-2-2b-jpn-it-abliterated-17-18-24](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-17-18-24/results_2024-11-06T19-05-49.169139.json) | 29.17 | 51.33 | 37.82 | 0.0 | 28.10 | 34.92 | 22.82 |

The abliterated-18-merge model is slightly better than the abliterated-18 model but slightly worse than the original instruct model.

## How to run this model

```py
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "gemma-2-2b-ORPO-jpn-it-abliterated-18-merge"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,)

chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
```

## Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

```
pip install -U "huggingface_hub[cli]"
```

Then, you can target the specific file you want:

```
huggingface-cli download ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge --include "*" --local-dir ./
```

## Credits

Thank you mlabonne for describing the ORPO fine tuning method.

Thank you FullOf_Bad_Ideas from LocalLlama for the suggestion of using unsloth to save VRAM.