|
--- |
|
license: bigscience-bloom-rail-1.0 |
|
datasets: |
|
- Anthropic/hh-rlhf |
|
language: |
|
- en |
|
- fr |
|
--- |
|
|
|
# bloomz-3b-dpo-chat Model Card |
|
|
|
|
|
## Model Overview |
|
|
|
The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to |
|
provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor. |
|
|
|
**Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)** |
|
|
|
--- |
|
|
|
## Model Description |
|
|
|
The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient |
|
tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts. |
|
|
|
## Multilingual Capabilities |
|
|
|
The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential |
|
data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring |
|
performance in other languages. |
|
|
|
## Model Applications |
|
|
|
This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is |
|
essential. |
|
|
|
|
|
## Dataset |
|
|
|
The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes: |
|
|
|
Human Preference Data: |
|
- **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs. |
|
- **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents. |
|
- **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process. |
|
|
|
## Evaluation |
|
|
|
Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on **100 French questions** with scores aggregated from six evaluations |
|
(two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet. |
|
|
|
**Performance Scores (on a scale of 5):** |
|
| Model | Score | # params | |
|
|---------------------------------------------:|:-------:|:--------:| |
|
| gpt-4o | 4.13 | N/A | |
|
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 3.71 | 46.7b | |
|
| gpt-3.5-turbo | 3.66 | 175b | |
|
| mistralai/Mistral-7B-Instruct-v0.2 | 1.98 | 7.25b | |
|
| cmarkea/bloomz-7b1-mt-sft-chat | 1.69 | 7.1b | |
|
| cmarkea/bloomz-3b-dpo-chat | 1.68 | 3b | |
|
| cmarkea/bloomz-3b-sft-chat | 1.51 | 3b | |
|
| croissantllm/CroissantLLMChat-v0.1 | 1.19 | 1.3b | |
|
| cmarkea/bloomz-560m-sft-chat | 1.04 | 0.56b | |
|
| OpenLLM-France/Claire-Mistral-7B-0.1 | 0.38 | 7.25b | |
|
|
|
The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for |
|
production environments. |
|
|
|
|
|
## Usage |
|
|
|
To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows: |
|
``` |
|
</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s> |
|
``` |
|
Example code to load the model using HuggingFace's pipeline: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat") |
|
result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512) |
|
|
|
result |
|
[{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage |
|
en profondeur est un sous-ensemble de l'apprentissage automatique qui |
|
utilise des réseaux de neurones artificiels pour apprendre à partir de |
|
données. Ces réseaux sont conçus pour reconnaître des modèles dans les |
|
données et peuvent être utilisés pour des tâches telles que la |
|
reconnaissance d'images, le traitement du langage naturel et la |
|
reconnaissance vocale."}] |
|
``` |
|
|
|
|
|
### Citation |
|
|
|
```bibtex |
|
@online{DeBloomzChat, |
|
AUTHOR = {Cyrile Delestre}, |
|
URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat}, |
|
YEAR = {2024}, |
|
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz}, |
|
} |
|
``` |