File size: 4,536 Bytes

b6e8bb7
 
423476d
 
 
 
 
464d1fe
 
ca38345
464d1fe
 
ca38345
464d1fe
 
 
 
 
 
 
 
ca38345
464d1fe
 
 
 
ca38345
464d1fe
 
 
 
 
ca38345
464d1fe
 
 
 
 
ca38345
464d1fe
82b87a0
464d1fe
28765ca
82b87a0
 
 
464d1fe
ca38345
464d1fe
168af80
464d1fe
 
 
bb07343
 
 
 
 
 
 
 
 
 
 
 
464d1fe
 
 
 
 
ca38345
464d1fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc38c00
 
 
464d1fe
 
 
92b2f25
464d1fe

---
license: bigscience-bloom-rail-1.0
datasets:
- Anthropic/hh-rlhf
language:
- en
- fr
---

# bloomz-3b-dpo-chat Model Card


## Model Overview

The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.

**Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)**

---

## Model Description

The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.

## Multilingual Capabilities

The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
performance in other languages.

## Model Applications

This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
essential. 


## Dataset

The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:

Human Preference Data:
   - **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs.
   - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
   - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.

## Evaluation

Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on **100 French questions** with scores aggregated from six evaluations
(two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.

**Performance Scores (on a scale of 5):**
| Model                                        | Score   | # params |
|---------------------------------------------:|:-------:|:--------:|
| gpt-4o                                       | 4.13    | N/A      |
| mistralai/Mixtral-8x7B-Instruct-v0.1         | 3.71    | 46.7b    |
| gpt-3.5-turbo                                | 3.66    | 175b     |
| mistralai/Mistral-7B-Instruct-v0.2           | 1.98    | 7.25b    |
| cmarkea/bloomz-7b1-mt-sft-chat               | 1.69    | 7.1b     |
| cmarkea/bloomz-3b-dpo-chat                   | 1.68    | 3b       |
| cmarkea/bloomz-3b-sft-chat                   | 1.51    | 3b       |
| croissantllm/CroissantLLMChat-v0.1           | 1.19    | 1.3b     |
| cmarkea/bloomz-560m-sft-chat                 | 1.04    | 0.56b    |
| OpenLLM-France/Claire-Mistral-7B-0.1         | 0.38    | 7.25b    |

The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for
production environments.


## Usage

To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
```
</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>
```
Example code to load the model using HuggingFace's pipeline:

```python
from transformers import pipeline

model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat")
result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512)

result
[{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage
   en profondeur est un sous-ensemble de l'apprentissage automatique qui
   utilise des réseaux de neurones artificiels pour apprendre à partir de
   données. Ces réseaux sont conçus pour reconnaître des modèles dans les
   données et peuvent être utilisés pour des tâches telles que la 
   reconnaissance d'images, le traitement du langage naturel et la
   reconnaissance vocale."}]
```


### Citation

```bibtex
@online{DeBloomzChat,
  AUTHOR = {Cyrile Delestre},
  URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat},
  YEAR = {2024},
  KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
}
```