Spaces:

Dovakiins
/

qwerrwe

Build error

App Files Files Community

qwerrwe / docs /dataset-formats /conversation.qmd

hamel

Reorganize Docs (#1468)

86b7d22 unverified over 1 year ago

raw

history blame

1.78 kB

	---
	title: Conversation
	description: Conversation format for supervised fine-tuning.
	order: 1
	---

	## Formats

	### sharegpt

	conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)

	```{.json filename="data.jsonl"}
	{"conversations": [{"from": "...", "value": "..."}]}
	```

	Note: `type: sharegpt` opens a special config `conversation:` that enables conversions to many Conversation types. See [the docs](../docs/config.qmd) for all config options.

	### pygmalion

	```{.json filename="data.jsonl"}
	{"conversations": [{"role": "...", "value": "..."}]}
	```

	### sharegpt.load_role

	conversations where `role` is used instead of `from`

	```{.json filename="data.jsonl"}
	{"conversations": [{"role": "...", "value": "..."}]}
	```

	### sharegpt.load_guanaco

	conversations where `from` is `prompter` `assistant` instead of default sharegpt

	```{.json filename="data.jsonl"}
	{"conversations": [{"from": "...", "value": "..."}]}
	```

	### sharegpt_jokes

	creates a chat where bot is asked to tell a joke, then explain why the joke is funny

	```{.json filename="data.jsonl"}
	{"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
	```

	## How to add custom prompts for instruction-tuning

	For a dataset that is preprocessed for instruction purposes:

	```{.json filename="data.jsonl"}
	{"input": "...", "output": "..."}
	```

	You can use this example in your YAML config:

	```{.yaml filename="config.yaml"}
	datasets:
	- path: repo
	type:
	system_prompt: ""
	field_system: system
	field_instruction: input
	field_output: output
	format: "[INST] {instruction} [/INST]"
	no_input_format: "[INST] {instruction} [/INST]"
	```

	See full config options under [here](../docs/config.qmd).