adding in citation and making it easier to read tutorial
Browse files
README.md
CHANGED
@@ -106,29 +106,59 @@ NOTE: Things that we had to modify in order for BLOOMChat to work:
|
|
106 |
|
107 |
Modifications for `inference_server/models/hf_accelerate.py`:
|
108 |
|
109 |
-
```
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
|
119 |
-
kwargs["max_memory"] = reduce_max_memory_dict
|
120 |
```
|
121 |
|
122 |
Modifications for `inference_server/cli.py`:
|
123 |
|
124 |
-
```
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
|
130 |
-
input_text = input_text.strip()
|
131 |
-
modified_input_text = f"<human>: {input_text}\n<bot>:"
|
132 |
```
|
133 |
|
134 |
Running command for bf16
|
@@ -397,3 +427,14 @@ We are grateful to the various researchers and open-source projects that have co
|
|
397 |
We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
|
398 |
|
399 |
We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
Modifications for `inference_server/models/hf_accelerate.py`:
|
108 |
|
109 |
+
```diff
|
110 |
+
diff --git a/inference_server/models/hf_accelerate.py b/inference_server/models/hf_accelerate.py
|
111 |
+
index 9be3c3f..a8ecb1d 100644
|
112 |
+
--- a/inference_server/models/hf_accelerate.py
|
113 |
+
+++ b/inference_server/models/hf_accelerate.py
|
114 |
+
@@ -1,4 +1,5 @@
|
115 |
+
from argparse import Namespace
|
116 |
+
+from accelerate.utils.modeling import get_max_memory
|
117 |
+
|
118 |
+
import torch
|
119 |
+
|
120 |
+
@@ -12,6 +13,12 @@ class HFAccelerateModel(Model):
|
121 |
+
|
122 |
+
kwargs = {"pretrained_model_name_or_path": args.model_name, "device_map": "auto"}
|
123 |
+
|
124 |
+
+ original_max_memory_dict = get_max_memory()
|
125 |
+
+
|
126 |
+
+ reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
|
127 |
+
+
|
128 |
+
+ kwargs["max_memory"] = reduce_max_memory_dict
|
129 |
+
+
|
130 |
+
if get_world_size() > 1:
|
131 |
+
kwargs["device_map"] = "balanced_low_0"
|
132 |
|
|
|
133 |
```
|
134 |
|
135 |
Modifications for `inference_server/cli.py`:
|
136 |
|
137 |
+
```diff
|
138 |
+
diff --git a/inference_server/cli.py b/inference_server/cli.py
|
139 |
+
index fc903d5..5450236 100644
|
140 |
+
--- a/inference_server/cli.py
|
141 |
+
+++ b/inference_server/cli.py
|
142 |
+
@@ -22,6 +22,9 @@ def main() -> None:
|
143 |
+
while True:
|
144 |
+
input_text = input("Input text: ")
|
145 |
+
|
146 |
+
+ input_text = input_text.strip()
|
147 |
+
+ modified_input_text = f"<human>: {input_text}\n<bot>:"
|
148 |
+
+
|
149 |
+
if input("change generate_kwargs? [y/n] ") == "y":
|
150 |
+
while True:
|
151 |
+
try:
|
152 |
+
@@ -33,7 +36,7 @@ def main() -> None:
|
153 |
+
print("message =", e_message)
|
154 |
+
continue
|
155 |
+
|
156 |
+
- response = model.generate(text=[input_text], generate_kwargs=generate_kwargs)
|
157 |
+
+ response = model.generate(text=[modified_input_text], generate_kwargs=generate_kwargs)
|
158 |
+
|
159 |
+
print_rank_0("Output text:", response.text[0])
|
160 |
+
print_rank_0("Generated tokens:", response.num_generated_tokens[0])
|
161 |
|
|
|
|
|
162 |
```
|
163 |
|
164 |
Running command for bf16
|
|
|
427 |
We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
|
428 |
|
429 |
We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.
|
430 |
+
|
431 |
+
## Citation
|
432 |
+
|
433 |
+
@software{bloomchat,
|
434 |
+
title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
|
435 |
+
author = {SambaNova Systems, Together Computer},
|
436 |
+
url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
|
437 |
+
month = {5},
|
438 |
+
year = {2023},
|
439 |
+
version = {1.0},
|
440 |
+
}
|