Text Generation
Transformers
PyTorch
codegen
Inference Endpoints
File size: 4,237 Bytes
8ac1cc4
 
 
1da9502
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
license: bsd-3-clause
---
# codgen-16B-action

<!-- Provide a quick summary of what the model is/does. -->

codgen-16B-action is a 16 billion parameter model used for api based action generation. It is instruction tuned from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) on api based action generation datasets.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [SambaNova Systems](https://sambanova.ai/)
- **Model type:** Language Model
- **Language(s):** English
- **License:** 
- **Finetuned from model:** [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono)

### Basic Information

<!-- Provide the basic links for the model. -->
- **Paper**: [Link]
- **Github**: [Link]

### Licensing

TBD

## Uses
<details>
<summary>Click to expand</summary>
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
This model is intended for commercial and research use.


### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->


codgen-16B-action should NOT be used for purpose other than API based action generation.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page. 

</details>


---
## How to Get Started with the Model

<details>
<summary>Click to expand</summary>

### Loading in model with Huggingface

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-action")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-action", device_map="auto", torch_dtype="auto")
```

### Suggested Inference Parameters
- do_sample: False

### Suggested Prompts To Try in GPU Tutorial
```
Input text: Fenglu, can you add some?
```

```
Input text: Fenglu, can you add some?
```

```
Input text: 十七岁的风是什么颜色的?
```


</details>

---

## Training Details

<details>
<summary>Click to expand</summary>

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

- [Fenglu to add](https://huggingface.co/datasets/laion/OIG)


### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

We trained codegen-16b-action on 4 80GB A100 gpu's. We started from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono). We finetuned it on XXX dataset. 
All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)


### Prompting Style Used For Training
```

```

### Hyperparameters

- Hardware: A100 GPU
- Optimizer: AdamW
- Grad accumulation: 1
- Epochs: 8
- Global Batch size: 16
- Batch tokens: 16 * 2048 = 32,768 tokens
- Learning Rate: 1e-5
- Learning Rate Scheduler: Fixed LR
- Weight decay: 0.1

**Instruction-tuned Training on Dolly 2.0 and Oasst1**

- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Grad accumulation: 1
- Epochs: 3
- Global Batch size: 128
- Batch tokens: 128 * 2048 = 262,144 tokens
- Learning Rate: 1e-5
- Learning Rate Scheduler: Cosine Schedule with Warmup
- Warmup Steps: 0
- End Learning Ratio: 0.1
- Weight decay: 0.1

</details>



## Acknowledgment


## Cite codegen-16b-action
```
@software{bloomchat,
  title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
  author = {SambaNova Systems, Together Computer},
  url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
  month = {5},
  year = {2023},
  version = {1.0},
}
```