sambanovasystems
/

BLOOMChat-176B-v1

@@ -106,29 +106,59 @@ NOTE: Things that we had to modify in order for BLOOMChat to work:
 Modifications for `inference_server/models/hf_accelerate.py`:
-```python
-from accelerate.utils.modeling import get_max_memory
-...
-class HFAccelerateModel(Model):
-    def __init__(self, args: Namespace) -> None:
-        ...
-        original_max_memory_dict = get_max_memory()
-        reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
-        kwargs["max_memory"] = reduce_max_memory_dict
 ```
 Modifications for `inference_server/cli.py`:
-```python
-def main() -> None:
-    ...
-    while True:
-        input_text = input("Input text: ")
-        input_text = input_text.strip()
-        modified_input_text = f"<human>: {input_text}\n<bot>:"
 ```
 Running command for bf16
@@ -397,3 +427,14 @@ We are grateful to the various researchers and open-source projects that have co
 We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
 We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.

 Modifications for `inference_server/models/hf_accelerate.py`:
+```diff
+diff --git a/inference_server/models/hf_accelerate.py b/inference_server/models/hf_accelerate.py
+index 9be3c3f..a8ecb1d 100644
+--- a/inference_server/models/hf_accelerate.py
++++ b/inference_server/models/hf_accelerate.py
+@@ -1,4 +1,5 @@
+ from argparse import Namespace
++from accelerate.utils.modeling import get_max_memory
+ import torch
+@@ -12,6 +13,12 @@ class HFAccelerateModel(Model):
+         kwargs = {"pretrained_model_name_or_path": args.model_name, "device_map": "auto"}
++        original_max_memory_dict = get_max_memory()
++
++        reduce_max_memory_dict = {device_key: int(original_max_memory_dict[device_key] * 0.85) for device_key in original_max_memory_dict}
++
++        kwargs["max_memory"] = reduce_max_memory_dict
++
+         if get_world_size() > 1:
+             kwargs["device_map"] = "balanced_low_0"
 ```
 Modifications for `inference_server/cli.py`:
+```diff
+diff --git a/inference_server/cli.py b/inference_server/cli.py
+index fc903d5..5450236 100644
+--- a/inference_server/cli.py
++++ b/inference_server/cli.py
+@@ -22,6 +22,9 @@ def main() -> None:
+     while True:
+         input_text = input("Input text: ")
++        input_text = input_text.strip()
++        modified_input_text = f"<human>: {input_text}\n<bot>:"
++
+         if input("change generate_kwargs? [y/n] ") == "y":
+             while True:
+                 try:
+@@ -33,7 +36,7 @@ def main() -> None:
+                     print("message =", e_message)
+                     continue
+-        response = model.generate(text=[input_text], generate_kwargs=generate_kwargs)
++        response = model.generate(text=[modified_input_text], generate_kwargs=generate_kwargs)
+         print_rank_0("Output text:", response.text[0])
+         print_rank_0("Generated tokens:", response.num_generated_tokens[0])
 ```
 Running command for bf16
 We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [BigScience](https://bigscience.huggingface.co/) for their essential benchmarking contributions, which is very helpful in evaluating BLOOMChat's performance. We appreciate the inspiration from the wave of various recent open-source chat models, including [OpenAssistant-30B](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor), [LLaMA-Adapter-V2-65B](https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_chat65b), [Vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0), [Koala-13b](https://huggingface.co/TheBloke/koala-13B-HF), [OASST-Pythia-12b](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b), [Alpaca-13b](https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g), [ChatGLM-6b](https://github.com/THUDM/ChatGLM-6B), [FastChat-T5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0), [Dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), [LLaMA-13b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [StableLM-Tuned-Alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b), [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1), [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) and so on. We look forward to witnessing the continued growth and success of open-source chat-based models.
 We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of BLOOMChat, and we hope that our model can contribute to further advancements in the field.
+## Citation
+@software{bloomchat,
+  title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
+  author = {SambaNova Systems, Together Computer},
+  url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
+  month = {5},
+  year = {2023},
+  version = {1.0},
+}