Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ When compared to the base model, the DPO version answers questions more **accura
|
|
21 |
#### 🤖 **What is Direct Preference Optimization (DPO)?**
|
22 |
Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
|
23 |
|
24 |
-
### 🚀 **
|
25 |
|
26 |
### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)
|
27 |
|
|
|
21 |
#### 🤖 **What is Direct Preference Optimization (DPO)?**
|
22 |
Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
|
23 |
|
24 |
+
### 🚀 **Model demo:** [TRaVisionLM-DPO-Demo](https://huggingface.co/spaces/ucsahin/TraVisionLM-Demo)
|
25 |
|
26 |
### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)
|
27 |
|