File size: 7,313 Bytes
5b796ac 776ece8 69a55c5 fdb017a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
tags:
- merge
- mergekit
- lazymergekit
- OpenPipe/mistral-ft-optimized-1218
- HuggingFaceH4/zephyr-7b-beta
base_model:
- OpenPipe/mistral-ft-optimized-1218
- HuggingFaceH4/zephyr-7b-beta
---
# AeolusBlend-7B-slerp
AeolusBlend-7B-slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218)
* [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
## 🧩 Configuration
```yaml
slices:
- sources:
- model: OpenPipe/mistral-ft-optimized-1218
layer_range: [0, 32]
- model: HuggingFaceH4/zephyr-7b-beta
layer_range: [0, 32]
merge_method: slerp
base_model: OpenPipe/mistral-ft-optimized-1218
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "lxyuan/AeolusBlend-7B-slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
>>>
[RESP] A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data to understand and generate human-like language. These models are typically made up of neural networks, which are a type of machine learning algorithm that can learn to recognize patterns in data and make predictions based on those patterns.
The term "large" in this context refers to the size of the model, which is measured by the number of parameters or connections in the neural network. Large language models can have billions or even trillions of parameters, making them capable of processing and generating extremely complex text.
Some examples of large language models include Google's BERT, OpenAI's GPT-3, and Facebook's Transformer-XL. These models have been trained on huge datasets such as books, articles, and web pages, allowing them to understand the nuances of language and generate text that is not only grammatically correct but also natural and fluent.
Large language models have a wide range of potential applications, including natural language processing (NLP) tasks such as text generation, translation, and summarization, as well as chatbots and virtual assistants that can
```
```python
messages = [{"role": "user", "content": "Who is Lee Kuan Yew? Summarize your answer in point form format"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
>>>
[REP] Lee Kuan Yew was a Singaporean statesman who served as the first Prime Minister of Singapore from 1959 to 1990. Here are some key points about him:
- Born in 1923 in Singapore, Lee was educated in England and was a lawyer by profession.
- He was a founding member of the People's Action Party (PAP), which he led for many years.
- Under Lee's leadership, Singapore achieved independence from Malaysia in 1965 and became a sovereign nation.
- Lee was known for his pragmatic and authoritarian style of governance, emphasizing economic growth, law and order, and meritocracy.
- He played a significant role in Singapore's rapid development, transforming the country from a poor and undeveloped nation into a modern and prosperous city-state.
- Lee passed away in 2015, at the age of 91.
- He was widely regarded as one of the most influential leaders of the 20th century and a key figure in the history of Singapore.
```
### 4-bit Inferencing Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import transformers
import torch
#!nvidia-smi
"""
Wed Feb 7 12:51:07 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB On | 00000000:00:1E.0 Off | 0 |
| N/A 41C P0 44W / 300W | 4950MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
"""
model_id = "lxyuan/AeolusBlend-7B-slerp"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto",
)
messages = [{"role": "user", "content": "What is a large language model?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
>>>
<s>[INST] What is a large language model? [/INST]
A large language model is a type of artificial intelligence system that has been trained on vast amounts of
text data, enabling it to generate human-like responses to a wide range of written prompts. These models are
designed to learn the patterns and rules of language, and as a result, they can perform various natural
language processing tasks, such as translation, summarization, and question-answering, with a high degree
of accuracy. Large language models are typically powered by deep learning algorithms and can have billions
or trillions of parameters, making them capable of processing and understanding complex language structures
and nuances. Some well-known examples of large language models include GPT-3, BERT, and T5.
```
- 4bit Inference Example notebook can be found [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Inference_4bit_AeolusBlend.ipynb)
- Text-to-Graph with AeolusBlend: [here](https://github.com/LxYuan0420/nlp/blob/117f09cf7f09e3284d6a1eed475652ef90bb8545/notebooks/Inference_AeolusBlend_KnowledgeGraph.ipynb) |