wissamantoun commited on
Commit
ecef327
·
verified ·
1 Parent(s): 9b27c51

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - deberta-v2
6
+ - text-classification
7
+ - nli
8
+ base_model: almanach/camembertav2-base
9
+ datasets:
10
+ - FLUE-XNLI
11
+ metrics:
12
+ - accuracy
13
+ pipeline_tag: text-classification
14
+ library_name: transformers
15
+ model-index:
16
+ - name: almanach/camembertav2-base-xnli
17
+ results:
18
+ - task:
19
+ type: text-classification
20
+ name: Natural Language Inference
21
+ dataset:
22
+ type: flue-XNLI
23
+ name: FLUE-XNLI
24
+ metrics:
25
+ - name: accuracy
26
+ type: accuracy
27
+ value: 0.85582
28
+ verified: false
29
+ ---
30
+
31
+ # Model Card for almanach/camembertav2-base-xnli
32
+
33
+ almanach/camembertav2-base-xnli is a deberta-v2 model for text classification. It is trained on the FLUE-XNLI dataset for the task of Natural Language Inference. The model achieves an accuracy of 0.85582 on the FLUE-XNLI dataset.
34
+
35
+ The model is part of the almanach/camembertav2-base family of model finetunes.
36
+
37
+ ## Model Details
38
+
39
+ ### Model Description
40
+
41
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
42
+ - **Model type:** deberta-v2
43
+ - **Language(s) (NLP):** French
44
+ - **License:** MIT
45
+ - **Finetuned from model [optional]:** almanach/camembertav2-base
46
+
47
+ ### Model Sources [optional]
48
+
49
+ <!-- Provide the basic links for the model. -->
50
+
51
+ - **Repository:** https://github.com/WissamAntoun/camemberta
52
+ - **Paper:** https://arxiv.org/abs/2411.08868
53
+
54
+ ## Uses
55
+
56
+ The model can be used for text classification tasks in French for Natural Language Inference.
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
61
+
62
+
63
+ ## How to Get Started with the Model
64
+
65
+ Use the code below to get started with the model.
66
+
67
+ ```python
68
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
69
+
70
+ model = AutoModelForSequenceClassification.from_pretrained("almanach/camembertav2-base-xnli")
71
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camembertav2-base-xnli")
72
+
73
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
74
+
75
+ classifier({
76
+ "text": "Le livre est très intéressant et j'ai appris beaucoup de choses.",
77
+ "text_pair": "Le livre est très ennuyeux et je n'ai rien appris.",
78
+ })
79
+ ```
80
+
81
+
82
+ ## Training Details
83
+
84
+ ### Training Data
85
+
86
+ The model is trained on the FLUE-XNLI dataset.
87
+
88
+ - Dataset Name: FLUE-XNLI
89
+ - Dataset Size:
90
+ - Train: 49399
91
+ - Dev: 1988
92
+ - Test: 2000
93
+
94
+
95
+ ### Training Procedure
96
+
97
+ Model trained with the run_xnli.py script from the huggingface repository.
98
+
99
+
100
+
101
+ #### Training Hyperparameters
102
+
103
+ ```yml
104
+ accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
105
+ True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
106
+ None}'
107
+ adafactor: false
108
+ adam_beta1: 0.9
109
+ adam_beta2: 0.999
110
+ adam_epsilon: 1.0e-08
111
+ auto_find_batch_size: false
112
+ base_model: camembertv2
113
+ base_model_name: camembertav2-base-bf16-p2-17000
114
+ batch_eval_metrics: false
115
+ bf16: false
116
+ bf16_full_eval: false
117
+ data_seed: 1.0
118
+ dataloader_drop_last: false
119
+ dataloader_num_workers: 0
120
+ dataloader_persistent_workers: false
121
+ dataloader_pin_memory: true
122
+ dataloader_prefetch_factor: .nan
123
+ ddp_backend: .nan
124
+ ddp_broadcast_buffers: .nan
125
+ ddp_bucket_cap_mb: .nan
126
+ ddp_find_unused_parameters: .nan
127
+ ddp_timeout: 1800
128
+ debug: '[]'
129
+ deepspeed: .nan
130
+ disable_tqdm: false
131
+ dispatch_batches: .nan
132
+ do_eval: true
133
+ do_predict: false
134
+ do_train: true
135
+ epoch: 10.0
136
+ eval_accumulation_steps: 4
137
+ eval_accuracy: 0.8558232931726908
138
+ eval_delay: 0
139
+ eval_do_concat_batches: true
140
+ eval_loss: 0.4229515790939331
141
+ eval_on_start: false
142
+ eval_runtime: 5.5233
143
+ eval_samples: 2490
144
+ eval_samples_per_second: 450.82
145
+ eval_steps: .nan
146
+ eval_steps_per_second: 56.488
147
+ eval_strategy: epoch
148
+ eval_use_gather_object: false
149
+ evaluation_strategy: epoch
150
+ fp16: false
151
+ fp16_backend: auto
152
+ fp16_full_eval: false
153
+ fp16_opt_level: O1
154
+ fsdp: '[]'
155
+ fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
156
+ False}'
157
+ fsdp_min_num_params: 0
158
+ fsdp_transformer_layer_cls_to_wrap: .nan
159
+ full_determinism: false
160
+ gradient_accumulation_steps: 4
161
+ gradient_checkpointing: false
162
+ gradient_checkpointing_kwargs: .nan
163
+ greater_is_better: true
164
+ group_by_length: false
165
+ half_precision_backend: auto
166
+ hub_always_push: false
167
+ hub_model_id: .nan
168
+ hub_private_repo: false
169
+ hub_strategy: every_save
170
+ hub_token: <HUB_TOKEN>
171
+ ignore_data_skip: false
172
+ include_inputs_for_metrics: false
173
+ include_num_input_tokens_seen: false
174
+ include_tokens_per_second: false
175
+ jit_mode_eval: false
176
+ label_names: .nan
177
+ label_smoothing_factor: 0.0
178
+ learning_rate: 1.0e-05
179
+ length_column_name: length
180
+ load_best_model_at_end: true
181
+ local_rank: 0
182
+ log_level: debug
183
+ log_level_replica: warning
184
+ log_on_each_node: true
185
+ logging_dir: /scratch/camembertv2/runs/results/xnli/camembertav2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-polynomial-warmup_steps-0.1/SEED-1/logs
186
+ logging_first_step: false
187
+ logging_nan_inf_filter: true
188
+ logging_steps: 100
189
+ logging_strategy: steps
190
+ lr_scheduler_kwargs: '{}'
191
+ lr_scheduler_type: polynomial
192
+ max_grad_norm: 1.0
193
+ max_steps: -1
194
+ metric_for_best_model: accuracy
195
+ mp_parameters: .nan
196
+ name: camembertv2/runs/results/xnli/camembertav2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-polynomial-warmup_steps-0.1
197
+ neftune_noise_alpha: .nan
198
+ no_cuda: false
199
+ num_train_epochs: 10.0
200
+ optim: adamw_torch
201
+ optim_args: .nan
202
+ optim_target_modules: .nan
203
+ output_dir: /scratch/camembertv2/runs/results/xnli/camembertav2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-polynomial-warmup_steps-0.1/SEED-1
204
+ overwrite_output_dir: false
205
+ past_index: -1
206
+ per_device_eval_batch_size: 8
207
+ per_device_train_batch_size: 8
208
+ per_gpu_eval_batch_size: .nan
209
+ per_gpu_train_batch_size: .nan
210
+ prediction_loss_only: false
211
+ push_to_hub: false
212
+ push_to_hub_model_id: .nan
213
+ push_to_hub_organization: .nan
214
+ push_to_hub_token: <PUSH_TO_HUB_TOKEN>
215
+ ray_scope: last
216
+ remove_unused_columns: true
217
+ report_to: '[''tensorboard'']'
218
+ restore_callback_states_from_checkpoint: false
219
+ resume_from_checkpoint: .nan
220
+ run_name: /scratch/camembertv2/runs/results/xnli/camembertav2-base-bf16-p2-17000/max_seq_length-160-gradient_accumulation_steps-4-precision-fp32-learning_rate-1e-05-epochs-10-lr_scheduler-polynomial-warmup_steps-0.1/SEED-1
221
+ save_on_each_node: false
222
+ save_only_model: false
223
+ save_safetensors: true
224
+ save_steps: 500
225
+ save_strategy: epoch
226
+ save_total_limit: .nan
227
+ seed: 1
228
+ skip_memory_metrics: true
229
+ split_batches: .nan
230
+ tf32: .nan
231
+ torch_compile: true
232
+ torch_compile_backend: inductor
233
+ torch_compile_mode: .nan
234
+ torch_empty_cache_steps: .nan
235
+ torchdynamo: .nan
236
+ total_flos: 1.6169329248859574e+17
237
+ tpu_metrics_debug: false
238
+ tpu_num_cores: .nan
239
+ train_loss: 0.2765254594356482
240
+ train_runtime: 31314.1287
241
+ train_samples: 392702
242
+ train_samples_per_second: 125.407
243
+ train_steps_per_second: 3.919
244
+ use_cpu: false
245
+ use_ipex: false
246
+ use_legacy_prediction_loop: false
247
+ use_mps_device: false
248
+ warmup_ratio: 0.1
249
+ warmup_steps: 0
250
+ weight_decay: 0.0
251
+
252
+ ```
253
+
254
+ #### Results
255
+
256
+ **Accuracy:** 0.85582
257
+
258
+ ## Technical Specifications
259
+
260
+ ### Model Architecture and Objective
261
+
262
+ deberta-v2 for sequence classification.
263
+
264
+ ## Citation
265
+
266
+ **BibTeX:**
267
+
268
+ ```bibtex
269
+ @misc{antoun2024camembert20smarterfrench,
270
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
271
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
272
+ year={2024},
273
+ eprint={2411.08868},
274
+ archivePrefix={arXiv},
275
+ primaryClass={cs.CL},
276
+ url={https://arxiv.org/abs/2411.08868},
277
+ }
278
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8558232931726908,
4
+ "eval_loss": 0.4229515790939331,
5
+ "eval_runtime": 5.5233,
6
+ "eval_samples": 2490,
7
+ "eval_samples_per_second": 450.82,
8
+ "eval_steps_per_second": 56.488,
9
+ "total_flos": 1.6169329248859574e+17,
10
+ "train_loss": 0.2765254594356482,
11
+ "train_runtime": 31314.1287,
12
+ "train_samples": 392702,
13
+ "train_samples_per_second": 125.407,
14
+ "train_steps_per_second": 3.919
15
+ }
config.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertav2-base-bf16/post/ckpt-p2-17000/pt/discriminator/",
3
+ "architectures": [
4
+ "DebertaV2ForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "conv_act": "gelu",
9
+ "conv_kernel_size": 0,
10
+ "embedding_size": 768,
11
+ "eos_token_id": 2,
12
+ "finetuning_task": "xnli",
13
+ "hidden_act": "gelu",
14
+ "hidden_dropout_prob": 0.1,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "entailment",
18
+ "1": "neutral",
19
+ "2": "contradiction"
20
+ },
21
+ "initializer_range": 0.02,
22
+ "intermediate_size": 3072,
23
+ "label2id": {
24
+ "contradiction": 2,
25
+ "entailment": 0,
26
+ "neutral": 1
27
+ },
28
+ "layer_norm_eps": 1e-07,
29
+ "max_position_embeddings": 1024,
30
+ "max_relative_positions": -1,
31
+ "model_name": "camembertav2-base-bf16",
32
+ "model_type": "deberta-v2",
33
+ "norm_rel_ebd": "layer_norm",
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 12,
36
+ "pad_token_id": 0,
37
+ "pooler_dropout": 0,
38
+ "pooler_hidden_act": "gelu",
39
+ "pooler_hidden_size": 768,
40
+ "pos_att_type": [
41
+ "p2c",
42
+ "c2p"
43
+ ],
44
+ "position_biased_input": false,
45
+ "position_buckets": 256,
46
+ "relative_attention": true,
47
+ "share_att_key": true,
48
+ "torch_dtype": "float32",
49
+ "transformers_version": "4.44.2",
50
+ "type_vocab_size": 0,
51
+ "vocab_size": 32768
52
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8558232931726908,
4
+ "eval_loss": 0.4229515790939331,
5
+ "eval_runtime": 5.5233,
6
+ "eval_samples": 2490,
7
+ "eval_samples_per_second": 450.82,
8
+ "eval_steps_per_second": 56.488
9
+ }
logs/events.out.tfevents.1724688665.nefgpu47.443271.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:491896fe7e592d18209d695d32e5834cad106afb3a6d248155deaf5e5738067a
3
+ size 272855
logs/events.out.tfevents.1724719985.nefgpu47.443271.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d451b2ab94b2202f6e45af5134074148690f544bd7649ab294fe7543e245e760
3
+ size 369
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:202a9488c158606656127f8347c0bebc788dddcd996bf129a6cfa07c437a6fa9
3
+ size 444862452
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "do_lower_case": false,
49
+ "eos_token": "[SEP]",
50
+ "errors": "replace",
51
+ "mask_token": "[MASK]",
52
+ "model_max_length": 1000000000000000019884624838656,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "[UNK]"
58
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 1.6169329248859574e+17,
4
+ "train_loss": 0.2765254594356482,
5
+ "train_runtime": 31314.1287,
6
+ "train_samples": 392702,
7
+ "train_samples_per_second": 125.407,
8
+ "train_steps_per_second": 3.919
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a6d0753c3f04e1bd007a6f65c6a9d5f962d36212202182e63d1b9eb784fa4be
3
+ size 5560
vocab.txt ADDED
The diff for this file is too large to render. See raw diff