jayr014 commited on
Commit
4d18223
·
1 Parent(s): 6da99f4

adding in citation and making it easier to read tutorial

Browse files
Files changed (1) hide show
  1. README.md +58 -17
README.md CHANGED
@@ -106,29 +106,59 @@ NOTE: Things that we had to modify in order for BLOOMChat to work:
106
 
107
  Modifications for `inference_server/models/hf_accelerate.py`:
108
 
109
- ```python
110
- from accelerate.utils.modeling import get_max_memory
111
- ...
112
- class HFAccelerateModel(Model):
113
- def __init__(self, args: Namespace) -> None:
114
- ...
115
- original_max_memory_dict = get_max_memory()
116
-
117
- reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
- kwargs["max_memory"] = reduce_max_memory_dict
120
  ```
121
 
122
  Modifications for `inference_server/cli.py`:
123
 
124
- ```python
125
- def main() -> None:
126
- ...
127
- while True:
128
- input_text = input("Input text: ")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
- input_text = input_text.strip()
131
- modified_input_text = f"<human>: {input_text}\n<bot>:"
132
  ```
133
 
134
  Running command for bf16
@@ -397,3 +427,14 @@ We are grateful to the various researchers and open-source projects that have co
397
  We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
398
 
399
  We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
  Modifications for `inference_server/models/hf_accelerate.py`:
108
 
109
+ ```diff
110
+ diff --git a/inference_server/models/hf_accelerate.py b/inference_server/models/hf_accelerate.py
111
+ index 9be3c3f..a8ecb1d 100644
112
+ --- a/inference_server/models/hf_accelerate.py
113
+ +++ b/inference_server/models/hf_accelerate.py
114
+ @@ -1,4 +1,5 @@
115
+ from argparse import Namespace
116
+ +from accelerate.utils.modeling import get_max_memory
117
+
118
+ import torch
119
+
120
+ @@ -12,6 +13,12 @@ class HFAccelerateModel(Model):
121
+
122
+ kwargs = {"pretrained_model_name_or_path": args.model_name, "device_map": "auto"}
123
+
124
+ + original_max_memory_dict = get_max_memory()
125
+ +
126
+ + reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
127
+ +
128
+ + kwargs["max_memory"] = reduce_max_memory_dict
129
+ +
130
+ if get_world_size() > 1:
131
+ kwargs["device_map"] = "balanced_low_0"
132
 
 
133
  ```
134
 
135
  Modifications for `inference_server/cli.py`:
136
 
137
+ ```diff
138
+ diff --git a/inference_server/cli.py b/inference_server/cli.py
139
+ index fc903d5..5450236 100644
140
+ --- a/inference_server/cli.py
141
+ +++ b/inference_server/cli.py
142
+ @@ -22,6 +22,9 @@ def main() -> None:
143
+ while True:
144
+ input_text = input("Input text: ")
145
+
146
+ + input_text = input_text.strip()
147
+ + modified_input_text = f"<human>: {input_text}\n<bot>:"
148
+ +
149
+ if input("change generate_kwargs? [y/n] ") == "y":
150
+ while True:
151
+ try:
152
+ @@ -33,7 +36,7 @@ def main() -> None:
153
+ print("message =", e_message)
154
+ continue
155
+
156
+ - response = model.generate(text=[input_text], generate_kwargs=generate_kwargs)
157
+ + response = model.generate(text=[modified_input_text], generate_kwargs=generate_kwargs)
158
+
159
+ print_rank_0("Output text:", response.text[0])
160
+ print_rank_0("Generated tokens:", response.num_generated_tokens[0])
161
 
 
 
162
  ```
163
 
164
  Running command for bf16
 
427
  We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
428
 
429
  We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.
430
+
431
+ ## Citation
432
+
433
+ @software{bloomchat,
434
+ title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
435
+ author = {SambaNova Systems, Together Computer},
436
+ url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
437
+ month = {5},
438
+ year = {2023},
439
+ version = {1.0},
440
+ }