Cyrile commited on
Commit
464d1fe
1 Parent(s): 423476d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -1
README.md CHANGED
@@ -5,4 +5,96 @@ datasets:
5
  language:
6
  - en
7
  - fr
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - en
7
  - fr
8
+ ---
9
+
10
+ ### bloomz-3b-dpo-chat Model Card
11
+
12
+
13
+ **Model Overview**
14
+
15
+ The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
16
+ provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.
17
+
18
+ **Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)**
19
+
20
+ ---
21
+
22
+ **Model Description**
23
+
24
+ The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
25
+ tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.
26
+
27
+ **Multilingual Capabilities**
28
+
29
+ The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
30
+ data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
31
+ performance in other languages.
32
+
33
+ **Model Applications**
34
+
35
+ This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
36
+ essential.
37
+
38
+
39
+ **Dataset**
40
+
41
+ The training dataset for the bloomz-7b1-mt-dpo-chat model consists of interactions between individuals and third parties, balanced equally between French and English. A
42
+ total of 0.9 billion tokens were used, with translations facilitated by the Google Translate API to maintain balance and quality.
43
+
44
+
45
+ **Evaluation**
46
+
47
+ Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations
48
+ (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.
49
+
50
+ **Performance Scores (on a scale of 5):**
51
+ | Model | Score |
52
+ |---------------------------------------------:|:--------|
53
+ | gpt-4o | 4.13 |
54
+ | mistralai/Mixtral-8x7B-Instruct-v0.1 | 3.71 |
55
+ | gpt-3.5-turbo | 3.66 |
56
+ | cmarkea/bloomz-7b1-mt-sft-chat | 1.69 |
57
+ | cmarkea/bloomz-3b-dpo-chat | 1.68 |
58
+ | cmarkea/bloomz-3b-sft-chat | 1.51 |
59
+ | croissantllm/CroissantLLMChat-v0.1 | 1.19 |
60
+ | cmarkea/bloomz-560m-sft-chat | 1.04 |
61
+ | OpenLLM-France/Claire-Mistral-7B-0.1 | 0.38 |
62
+
63
+ The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for
64
+ production environments.
65
+
66
+
67
+ **Usage**
68
+
69
+ To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
70
+ ```
71
+ </s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>
72
+ ```
73
+ Example code to load the model using HuggingFace's pipeline:
74
+
75
+ ```python
76
+ from transformers import pipeline
77
+
78
+ model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat")
79
+ result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512)
80
+
81
+ result
82
+ [{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage
83
+ en profondeur est un sous-ensemble de l'apprentissage automatique qui
84
+ utilise des réseaux de neurones artificiels pour apprendre à partir de
85
+ données. Ces réseaux sont conçus pour reconnaître des modèles dans les
86
+ données et peuvent être utilisés pour des tâches telles que la reconnaissance
87
+ d'images, le traitement du langage naturel et la reconnaissance vocale."}]
88
+ ```
89
+
90
+
91
+ **Citation**
92
+
93
+ ```bibtex
94
+ @online{DeBloomzChat,
95
+ AUTHOR = {Cyrile Delestre},
96
+ URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat},
97
+ YEAR = {2024},
98
+ KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
99
+ }
100
+ ```