OpenLLM-France
/

Claire-Mistral-7B-0.1

@@ -139,7 +139,7 @@ prompt = """\
 ### Training Data
-The training dataset will be made available soon.
 Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
@@ -147,10 +147,10 @@ Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distr
 |-------------------------------|------------|------------------------------|-----------------------------------------------------|
 | Parliamentary Proceedings     | 135M       | 35%                          | Assemblée Nationale                                 |
 | Theatre                       |  16M       | 18%                          | Théâtre Classique, Théâtre Gratuit                  |
-| Interviews                    |   6.4M     | 29%                          | TCOF, CFPP, CFPB, ACSYNT, PFC, Valibel (ORFEO), ESLO|
 | Free Conversations            |   2.2M     | 10%                          | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
 | Meetings                      |   1.2M     |  5%                          | SUMM-RE, LinTO, Réunions de travail (ORFEO)         |
-| Debates                       |   402k     | <2%                          | FreDSum, ESLO                                       |
 | Assistance                    |   159k     | <1%                          | Fleuron (ORFEO), Accueil UBS, OTG, ESLO             |
 | Presentation, Formal Address  |    86k     | <0.5%                        | Valibel (ORFEO), LinTO, ESLO                        |
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
 ### Training Procedure
-The training code will be made available soon.
 Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
 See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.

 ### Training Data
+The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
 Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
 |-------------------------------|------------|------------------------------|-----------------------------------------------------|
 | Parliamentary Proceedings     | 135M       | 35%                          | Assemblée Nationale                                 |
 | Theatre                       |  16M       | 18%                          | Théâtre Classique, Théâtre Gratuit                  |
+| Interviews                    |   6.4M     | 29%                          | TCOF, CFPP, CFPB (ORFEO), ACSYNT, PFC, Valibel (ORFEO), ESLO|
 | Free Conversations            |   2.2M     | 10%                          | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
 | Meetings                      |   1.2M     |  5%                          | SUMM-RE, LinTO, Réunions de travail (ORFEO)         |
+| Debates                       |   402k     | <2%                          | FREDSum, ESLO                                       |
 | Assistance                    |   159k     | <1%                          | Fleuron (ORFEO), Accueil UBS, OTG, ESLO             |
 | Presentation, Formal Address  |    86k     | <0.5%                        | Valibel (ORFEO), LinTO, ESLO                        |
 ### Training Procedure
+The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
 Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
 See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.