ucsahin
/

TraVisionLM-DPO

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

ucsahin commited on Sep 9, 2024

Commit

9f5581a

·

verified ·

1 Parent(s): 3de99d8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ When compared to the base model, the DPO version answers questions more **accura
 #### 🤖 **What is Direct Preference Optimization (DPO)?**
 Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
-### 🚀 **Check out the model here:** [TRaVisionLM-DPO-Demo](https://huggingface.co/spaces/ucsahin/TraVisionLM-Demo)
 ### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)

 #### 🤖 **What is Direct Preference Optimization (DPO)?**
 Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
+### 🚀 **Model demo:** [TRaVisionLM-DPO-Demo](https://huggingface.co/spaces/ucsahin/TraVisionLM-Demo)
 ### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)