Commit
·
6e70a89
1
Parent(s):
d43b85c
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,95 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- nicholasKluge/fine-tuning-instruct-aira
|
5 |
+
- Dahoas/synthetic-instruct-gptj-pairwise
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
metrics:
|
9 |
+
- bleu
|
10 |
+
library_name: transformers
|
11 |
+
tags:
|
12 |
+
- alignment
|
13 |
+
- instruction tuned
|
14 |
+
- text generation
|
15 |
+
- conversation
|
16 |
+
- assistant
|
17 |
+
pipeline_tag: text-generation
|
18 |
---
|
19 |
+
# Aira-Instruct-124M
|
20 |
+
|
21 |
+
`Aira-Instruct-124M` is a instruction-tuned GPT-style model based on [GPT-2](https://huggingface.co/gpt2). The model was trained with a dataset composed of `prompt`, `completions`, generated via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework. `Aira-Instruct-124M` instruction-tuning was achieved via conditional text generation.
|
22 |
+
|
23 |
+
The dataset used to train this model combines two main sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset focused on Ethics, AI, AI safety, and related topics. The dataset is available in both Portuguese and English languages.
|
24 |
+
|
25 |
+
## Details
|
26 |
+
|
27 |
+
- **Size:** 124,441,344 total parameters
|
28 |
+
- **Dataset:** [Instruct-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/fine-tuning-instruct-aira)
|
29 |
+
- **Number of Epochs:** 5
|
30 |
+
- **Batch size:** 32
|
31 |
+
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
|
32 |
+
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
33 |
+
|
34 |
+
| Epoch/Loss|Training|Validation|
|
35 |
+
|---|---|---|
|
36 |
+
| 1 |1.16884|0.66058|
|
37 |
+
| 2 |0.647947|0.622228|
|
38 |
+
| 3 |0.588665|0.605857|
|
39 |
+
| 4 |0.545835|0.596193|
|
40 |
+
| 5 |0.512876|0.595261|
|
41 |
+
|
42 |
+
> Note: This repository has the notebook used to train this model.
|
43 |
+
|
44 |
+
## Usage
|
45 |
+
|
46 |
+
Two special tokens are used to mark the user side of the interaction and the models response:
|
47 |
+
|
48 |
+
`<|startoftext|>` What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary. `<|endoftext|>`
|
49 |
+
|
50 |
+
```python
|
51 |
+
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
52 |
+
import torch
|
53 |
+
|
54 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
55 |
+
|
56 |
+
tokenizer = GPT2Tokenizer.from_pretrained('nicholasKluge/Aira-Instruct-124M')
|
57 |
+
aira = GPT2LMHeadModel.from_pretrained('nicholasKluge/Aira-Instruct-124M')
|
58 |
+
|
59 |
+
aira.to(device)
|
60 |
+
aira.eval()
|
61 |
+
|
62 |
+
question = input("Enter your question: ")
|
63 |
+
|
64 |
+
inputs = tokenizer(tokenizer.bos_token + question + tokenizer.eos_token, return_tensors="pt").to(device)
|
65 |
+
|
66 |
+
responses = aira.generate(**inputs,
|
67 |
+
bos_token_id=tokenizer.bos_token_id,
|
68 |
+
pad_token_id=tokenizer.pad_token_id,
|
69 |
+
eos_token_id=tokenizer.eos_token_id,
|
70 |
+
do_sample=True,
|
71 |
+
top_k=50,
|
72 |
+
max_length=200,
|
73 |
+
top_p=0.95,
|
74 |
+
temperature=0.7,
|
75 |
+
num_return_sequences=2)
|
76 |
+
|
77 |
+
print(f"Question: 👤 {question}\n")
|
78 |
+
|
79 |
+
for i, response in enumerate(responses):
|
80 |
+
# print only the response and remove the question
|
81 |
+
print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
|
82 |
+
```
|
83 |
+
|
84 |
+
The model will output something like:
|
85 |
+
|
86 |
+
```markdown
|
87 |
+
>>> Question: 👤 Hello! What is your name?
|
88 |
+
|
89 |
+
>>>Response 1: 🤖 Hi there! I am Aira, a chatbot designed to answer questions about AI ethics and AI safety. If you need assistance navigating our conversation, please feel free to ask!
|
90 |
+
>>>Response 2: 🤖 Hi there! My name is Aira, and I'm a chatbot designed to answer questions related to AI ethics and AI Safety. If you need assistance, feel free to ask, and I'll be happy to help you out.
|
91 |
+
```
|
92 |
+
|
93 |
+
## License
|
94 |
+
|
95 |
+
The `RewardModel` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
|