Update README.md
Browse files
README.md
CHANGED
@@ -2,13 +2,16 @@
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
|
|
|
|
5 |
---
|
6 |
Gazelle v0.2 is the mid-March release from [Tincans](https://tincans.ai) of a joint speech-language model.
|
7 |
|
8 |
This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
|
9 |
|
10 |
-
The datasets used were [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset) and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1?row=0). We trained for 2 epochs with
|
11 |
|
12 |
We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
|
13 |
|
14 |
-
Please see [this notebook](https://github.com/tincans-ai/gazelle/blob/2939d7034277506171d61a7a1001f535426faa71/examples/infer.ipynb) for an inference example.
|
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
+
datasets:
|
6 |
+
- jondurbin/truthy-dpo-v0.1
|
7 |
+
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
|
8 |
---
|
9 |
Gazelle v0.2 is the mid-March release from [Tincans](https://tincans.ai) of a joint speech-language model.
|
10 |
|
11 |
This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
|
12 |
|
13 |
+
The datasets used were [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset) (first iteration) and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1?row=0). We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.
|
14 |
|
15 |
We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
|
16 |
|
17 |
+
Please see [this notebook](https://github.com/tincans-ai/gazelle/blob/2939d7034277506171d61a7a1001f535426faa71/examples/infer.ipynb) for an inference example.
|