--- title: Conversation description: Conversation format for supervised fine-tuning. order: 1 --- ## Formats ### sharegpt conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt) ```{.json filename="data.jsonl"} {"conversations": [{"from": "...", "value": "..."}]} ``` Note: `type: sharegpt` opens a special config `conversation:` that enables conversions to many Conversation types. See [the docs](../docs/config.qmd) for all config options. ### pygmalion ```{.json filename="data.jsonl"} {"conversations": [{"role": "...", "value": "..."}]} ``` ### sharegpt.load_role conversations where `role` is used instead of `from` ```{.json filename="data.jsonl"} {"conversations": [{"role": "...", "value": "..."}]} ``` ### sharegpt.load_guanaco conversations where `from` is `prompter` `assistant` instead of default sharegpt ```{.json filename="data.jsonl"} {"conversations": [{"from": "...", "value": "..."}]} ``` ### sharegpt_jokes creates a chat where bot is asked to tell a joke, then explain why the joke is funny ```{.json filename="data.jsonl"} {"conversations": [{"title": "...", "text": "...", "explanation": "..."}]} ``` ## How to add custom prompts for instruction-tuning For a dataset that is preprocessed for instruction purposes: ```{.json filename="data.jsonl"} {"input": "...", "output": "..."} ``` You can use this example in your YAML config: ```{.yaml filename="config.yaml"} datasets: - path: repo type: system_prompt: "" field_system: system field_instruction: input field_output: output format: "[INST] {instruction} [/INST]" no_input_format: "[INST] {instruction} [/INST]" ``` See full config options under [here](../docs/config.qmd).