qwerrwe / docs /dataset-formats /conversation.qmd
hamel's picture
Reorganize Docs (#1468)
86b7d22 unverified
raw
history blame
1.78 kB
---
title: Conversation
description: Conversation format for supervised fine-tuning.
order: 1
---
## Formats
### sharegpt
conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)
```{.json filename="data.jsonl"}
{"conversations": [{"from": "...", "value": "..."}]}
```
Note: `type: sharegpt` opens a special config `conversation:` that enables conversions to many Conversation types. See [the docs](../docs/config.qmd) for all config options.
### pygmalion
```{.json filename="data.jsonl"}
{"conversations": [{"role": "...", "value": "..."}]}
```
### sharegpt.load_role
conversations where `role` is used instead of `from`
```{.json filename="data.jsonl"}
{"conversations": [{"role": "...", "value": "..."}]}
```
### sharegpt.load_guanaco
conversations where `from` is `prompter` `assistant` instead of default sharegpt
```{.json filename="data.jsonl"}
{"conversations": [{"from": "...", "value": "..."}]}
```
### sharegpt_jokes
creates a chat where bot is asked to tell a joke, then explain why the joke is funny
```{.json filename="data.jsonl"}
{"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
```
## How to add custom prompts for instruction-tuning
For a dataset that is preprocessed for instruction purposes:
```{.json filename="data.jsonl"}
{"input": "...", "output": "..."}
```
You can use this example in your YAML config:
```{.yaml filename="config.yaml"}
datasets:
- path: repo
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
```
See full config options under [here](../docs/config.qmd).