ucsahin commited on
Commit
9f5581a
·
verified ·
1 Parent(s): 3de99d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ When compared to the base model, the DPO version answers questions more **accura
21
  #### 🤖 **What is Direct Preference Optimization (DPO)?**
22
  Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
23
 
24
- ### 🚀 **Check out the model here:** [TRaVisionLM-DPO-Demo](https://huggingface.co/spaces/ucsahin/TraVisionLM-Demo)
25
 
26
  ### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)
27
 
 
21
  #### 🤖 **What is Direct Preference Optimization (DPO)?**
22
  Direct Preference Optimization is a technique used to align a model’s behavior with human preferences. The process works by showing the model several possible answers to a question and training it to favor the response preferred by humans. This leads to more reliable and truthful responses, as the model learns not only from raw data but also from user feedback. DPO helps to **minimize hallucinations** and improves the **quality** and **accuracy** of the model’s answers.
23
 
24
+ ### 🚀 **Model demo:** [TRaVisionLM-DPO-Demo](https://huggingface.co/spaces/ucsahin/TraVisionLM-Demo)
25
 
26
  ### 📚 **Visual Language Model DPO Training Notebook:** [Colab Notebook](https://colab.research.google.com/drive/1ypEPQ3RBX3_X7m9qfmU-Op-vGgOjab_z?usp=sharing)
27