File size: 5,374 Bytes
354c18c 947c20e 5ba4356 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 13ee8af 947c20e 354c18c 947c20e 354c18c 2ec216a 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 1e92226 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e 354c18c 947c20e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
datasets:
- Thermostatic/flowers
- jondurbin/truthy-dpo-v0.1
- Intel/orca_dpo_pairs
- glaiveai/glaive-function-calling-v2
---
# Gemma Orchid 7b
<div align="center">
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/7pqiroePJW0WWm6JxwBoO.webp)
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
</div>
This model is the second checkpoint of a future project. Its capable of function calling as well as having a strong base in communicational skills.
This model has been finetuned on roughly 80k samples so far.
# Training
+ Time to complete: ~20 hours
+ Datasets: Thermostatic/flowers, Intel/orca_dpo_pairs, jondurbin/truthy-dpo-v0.1, glaiveai/glaive_function_calling_v2
+ Evaluation loss: 0.69
+ Method: LoRa
+ Prompt Format: ChatML
Thermostatic/flowers is a blend of open source model generations formatted in ShareGPT. It also includes all of capybara.
This model has been exposed to a wide variety of data. [macadeliccc/gemma-function-calling-7b](https://huggingface.co/macadeliccc/gemma-function-calling-7b) is suitable to finetune further with the dataset of your choosing.
#### Running the model on a CPU
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Running the model on a single / multi GPU
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo", device_map="auto")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Running the model on a GPU using different precisions
* _Using `torch.float16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo", device_map="auto", torch_dtype=torch.float16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using `torch.bfloat16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Quantized Versions through `bitsandbytes`
* _Using 8-bit precision (int8)_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo", quantization_config=quantization_config)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using 4-bit precision_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/gemma-orchid-7b-dpo")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/gemma-orchid-7b-dpo", quantization_config=quantization_config)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Other optimizations
* _Flash Attention 2_
First make sure to install `flash-attn` in your environment `pip install flash-attn`
```diff
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2"
).to(0)
```
### Inputs and outputs
* **Input:** Text string, such as a question, a prompt, or a document to be
summarized.
* **Output:** Generated English-language text in response to the input, such
as an answer to a question, or a summary of a document.
## Evaluations
In progress
|