jayr014 commited on
Commit
86f34fc
·
1 Parent(s): ac22e40

adding in first draft of modelcard, still missing some sections

Browse files
Files changed (1) hide show
  1. README.md +131 -1
README.md CHANGED
@@ -1,3 +1,133 @@
1
  ---
 
 
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
  license: apache-2.0
5
+ ---
6
+
7
+ # BloomChat V1.0
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ BloomChat-v1.0 is based on [BigScience Group Bloom-176 model](https://huggingface.co/bigscience/bloom), and is instruction-tuned on a subset of 100k datapoints per data source from the [OIG dataset](https://huggingface.co/datasets/laion/OIG) provided by laion. Then aligned using [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1).
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+
19
+ - **Developed by:** [SambaNova Systems](https://sambanova.ai/) and [Together Computer](https://www.together.xyz/)
20
+ - **Model type:** Language Model
21
+ - **Language(s):** Multiple; see [training data from Bloom-176B](https://huggingface.co/bigscience/bloom#training-data)
22
+ - **License:** apache-2.0
23
+ - **Instruction Tuned from model:** [BigScience Group Bloom-176B](https://huggingface.co/bigscience/bloom)
24
+
25
+ ### Additional Information
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Blogpost:** [More Information Needed]
30
+
31
+ ## Uses
32
+
33
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
34
+
35
+ ### Direct Use
36
+
37
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
38
+
39
+ [More Information Needed]
40
+
41
+ ### Downstream Use [optional]
42
+
43
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
44
+
45
+ [More Information Needed]
46
+
47
+ ### Out-of-Scope Use
48
+
49
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
50
+
51
+ [More Information Needed]
52
+
53
+ ## Bias, Risks, and Limitations
54
+
55
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
56
+
57
+ Like all LLMs, BloomChat has certain limitations:
58
+ - Hallucination: BloomChat may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
59
+ - Code Switching: The model might unintentionally switch between languages or dialects within a single response, affecting the coherence and understandability of the output.
60
+ - Repetition: BloomChat may produce repetitive phrases or sentences, leading to less engaging and informative responses.
61
+ - Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
62
+ - Toxicity: BloomChat may inadvertently generate responses containing inappropriate or harmful content.
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ - [OIG dataset](https://huggingface.co/datasets/laion/OIG)
83
+ - [Dolly 2.0](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
84
+ - [Oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
85
+
86
+ ### Training Procedure
87
+
88
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
89
+
90
+ We trained BloomChat with SambaStudio, a platform built on SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from [Bloom-176B](https://huggingface.co/bigscience/bloom), an OSS multilingual 176B GPT model pretrained by the [BigScience group](https://huggingface.co/bigscience).
91
+
92
+ ### Hyperparameters
93
+
94
+ **Instruction-tuned Training on OIG**
95
+
96
+ - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
97
+ - Optimizer: AdamW
98
+ - Grad accumulation: 1
99
+ - Epochs: 1
100
+ - Global Batch size: 128
101
+ - Batch tokens: 128 * 2048 = 262,144 tokens
102
+ - LR: 1e-5
103
+ - Weight decay: 0.1
104
+
105
+ **Instruction-tuned Training on Dolly 2.0 and Oasst1**
106
+
107
+ - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
108
+ - Optimizer: AdamW
109
+ - Grad accumulation: 1
110
+ - Epochs: 3
111
+ - Global Batch size: 128
112
+ - Batch tokens: 128 * 2048 = 262,144 tokens
113
+ - LR: 1e-5
114
+ - Weight decay: 0.1
115
+
116
+
117
+ ## Evaluation
118
+
119
+ <!-- This section describes the evaluation protocols and provides the results. -->
120
+
121
+ ![HELM core-scenarios](HELM_core-senarios_CNN+MS_Marco_WIP.png)
122
+
123
+ ![Multilingual scores French and hindi](Multilinguality_WMT-14_on_French+Hindi.png)
124
+
125
+ ![Multilingual scores Chinese](Multilinguality_WMT-14_on_Simplified_Chinese.png)
126
+
127
+ ![Mean Win Rate on HELM](Open_source_model_Mean_Win_Rate_on_HELM_core_scenarios.png)
128
+
129
+ ## Community
130
+
131
+ [Link to discord server]
132
+
133
+