Cyrile commited on
Commit
ca38345
1 Parent(s): 82b87a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -17
README.md CHANGED
@@ -7,10 +7,10 @@ language:
7
  - fr
8
  ---
9
 
10
- ## bloomz-3b-dpo-chat Model Card
11
 
12
 
13
- ### Model Overview
14
 
15
  The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
16
  provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.
@@ -19,24 +19,24 @@ provide high-quality conversational abilities in both English and French, levera
19
 
20
  ---
21
 
22
- ### Model Description
23
 
24
  The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
25
  tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.
26
 
27
- ### Multilingual Capabilities
28
 
29
  The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
30
  data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
31
  performance in other languages.
32
 
33
- ### Model Applications
34
 
35
  This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
36
  essential.
37
 
38
 
39
- ### Dataset
40
 
41
  The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
42
 
@@ -45,15 +45,7 @@ The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://h
45
  - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
46
  - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.
47
 
48
- 2. **Red Teaming Data:**
49
- - **Description:** Transcripts of conversations between human adversaries (red team members) and AI assistants, annotated for harmfulness.
50
- - **Purpose:** To study and mitigate harmful behaviors in AI models, not for fine-tuning or preference modeling.
51
- - **Content:** Transcripts, harmlessness scores, model parameters, success ratings, and red team attack descriptions.
52
-
53
- **Disclaimer:** The dataset contains sensitive and potentially upsetting content. It is intended for research to make AI models safer. Engage with caution.
54
-
55
-
56
- ### Evaluation
57
 
58
  Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations
59
  (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.
@@ -75,7 +67,7 @@ The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT coun
75
  production environments.
76
 
77
 
78
- ### Usage
79
 
80
  To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
81
  ```
@@ -100,7 +92,7 @@ result
100
  ```
101
 
102
 
103
- ### Citation
104
 
105
  ```bibtex
106
  @online{DeBloomzChat,
 
7
  - fr
8
  ---
9
 
10
+ # bloomz-3b-dpo-chat Model Card
11
 
12
 
13
+ ## Model Overview
14
 
15
  The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
16
  provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.
 
19
 
20
  ---
21
 
22
+ ## Model Description
23
 
24
  The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
25
  tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.
26
 
27
+ ## Multilingual Capabilities
28
 
29
  The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
30
  data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
31
  performance in other languages.
32
 
33
+ ## Model Applications
34
 
35
  This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
36
  essential.
37
 
38
 
39
+ ## Dataset
40
 
41
  The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
42
 
 
45
  - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
46
  - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.
47
 
48
+ ## Evaluation
 
 
 
 
 
 
 
 
49
 
50
  Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations
51
  (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.
 
67
  production environments.
68
 
69
 
70
+ ## Usage
71
 
72
  To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
73
  ```
 
92
  ```
93
 
94
 
95
+ ## Citation
96
 
97
  ```bibtex
98
  @online{DeBloomzChat,