Jeronymous commited on
Commit
162f8b7
·
1 Parent(s): 0aeb68b

Add links to dataset and code

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -139,7 +139,7 @@ prompt = """\
139
 
140
  ### Training Data
141
 
142
- The training dataset will be made available soon.
143
 
144
  Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
145
 
@@ -147,10 +147,10 @@ Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distr
147
  |-------------------------------|------------|------------------------------|-----------------------------------------------------|
148
  | Parliamentary Proceedings | 135M | 35% | Assemblée Nationale |
149
  | Theatre | 16M | 18% | Théâtre Classique, Théâtre Gratuit |
150
- | Interviews | 6.4M | 29% | TCOF, CFPP, CFPB, ACSYNT, PFC, Valibel (ORFEO), ESLO|
151
  | Free Conversations | 2.2M | 10% | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
152
  | Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
153
- | Debates | 402k | <2% | FreDSum, ESLO |
154
  | Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
155
  | Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
156
 
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
165
 
166
  ### Training Procedure
167
 
168
- The training code will be made available soon.
169
 
170
  Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
171
  See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.
 
139
 
140
  ### Training Data
141
 
142
+ The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
143
 
144
  Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
145
 
 
147
  |-------------------------------|------------|------------------------------|-----------------------------------------------------|
148
  | Parliamentary Proceedings | 135M | 35% | Assemblée Nationale |
149
  | Theatre | 16M | 18% | Théâtre Classique, Théâtre Gratuit |
150
+ | Interviews | 6.4M | 29% | TCOF, CFPP, CFPB (ORFEO), ACSYNT, PFC, Valibel (ORFEO), ESLO|
151
  | Free Conversations | 2.2M | 10% | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
152
  | Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
153
+ | Debates | 402k | <2% | FREDSum, ESLO |
154
  | Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
155
  | Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
156
 
 
165
 
166
  ### Training Procedure
167
 
168
+ The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
169
 
170
  Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
171
  See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.