wissamantoun commited on
Commit
cf96f54
1 Parent(s): c1700a4

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ eval_nbest_predictions.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - deberta-v2
6
+ - question-answering
7
+ base_model: almanach/camembertav2-base
8
+ datasets:
9
+ - FQuAD
10
+ metrics:
11
+ - accuracy
12
+ pipeline_tag: text-classification
13
+ library_name: transformers
14
+ model-index:
15
+ - name: almanach/camembertav2-base-fquad
16
+ results:
17
+ - task:
18
+ type: text-classification
19
+ name: Natural Language Inference
20
+ dataset:
21
+ type: FQuAD
22
+ name: FQuAD
23
+ metrics:
24
+ - name: accuracy
25
+ type: accuracy
26
+ value:
27
+ verified: false
28
+ ---
29
+
30
+ # Model Card for almanach/camembertav2-base-fquad
31
+
32
+ almanach/camembertav2-base-fquad is a deberta-v2 model for question answering. It is trained on the FQuAD dataset for the task of Extractive Question Answering. The model achieves an f1-score of 83.36016 on the FQuAD dataset.
33
+
34
+ The model is part of the almanach/camembertav2-base family of model finetunes.
35
+
36
+ ## Model Details
37
+
38
+ ### Model Description
39
+
40
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
41
+ - **Model type:** deberta-v2
42
+ - **Language(s) (NLP):** French
43
+ - **License:** MIT
44
+ - **Finetuned from model [optional]:** almanach/camembertav2-base
45
+
46
+ ### Model Sources [optional]
47
+
48
+ <!-- Provide the basic links for the model. -->
49
+
50
+ - **Repository:** https://github.com/WissamAntoun/camemberta
51
+ - **Paper:** https://arxiv.org/abs/2411.08868
52
+
53
+ ## Uses
54
+
55
+ The model can be used for question answering tasks in French for Extractive Question Answering.
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
60
+
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Use the code below to get started with the model.
65
+
66
+ ```python
67
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
68
+
69
+ model = AutoModelForQuestionAnswering.from_pretrained("almanach/camembertav2-base-fquad")
70
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camembertav2-base-fquad")
71
+
72
+ classifier = pipeline("question-answering", model=model, tokenizer=tokenizer)
73
+
74
+ classifier(question="Quelle est la capitale de la France ?", context="La capitale de la France est Paris.")
75
+ ```
76
+
77
+
78
+ ## Training Details
79
+
80
+ ### Training Data
81
+
82
+ The model is trained on the FQuAD dataset.
83
+
84
+ - Dataset Name: FQuAD
85
+ - Dataset Size:
86
+ - Train: 20731
87
+ - Dev: 3188
88
+
89
+
90
+ ### Training Procedure
91
+
92
+ Model trained with the run_qa.py script from the huggingface repository.
93
+
94
+
95
+
96
+ #### Training Hyperparameters
97
+
98
+ ```yml
99
+ 'Unnamed: 0': /scratch/camembertv2/runs/results/fquad/camembertav2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-2-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-1/all_results.json
100
+ accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
101
+ True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
102
+ None}'
103
+ adafactor: false
104
+ adam_beta1: 0.9
105
+ adam_beta2: 0.999
106
+ adam_epsilon: 1.0e-08
107
+ auto_find_batch_size: false
108
+ base_model: camembertv2
109
+ base_model_name: camembertav2-base-bf16-p2-17000
110
+ batch_eval_metrics: false
111
+ bf16: false
112
+ bf16_full_eval: false
113
+ data_seed: 1.0
114
+ dataloader_drop_last: false
115
+ dataloader_num_workers: 0
116
+ dataloader_persistent_workers: false
117
+ dataloader_pin_memory: true
118
+ dataloader_prefetch_factor: .nan
119
+ ddp_backend: .nan
120
+ ddp_broadcast_buffers: .nan
121
+ ddp_bucket_cap_mb: .nan
122
+ ddp_find_unused_parameters: .nan
123
+ ddp_timeout: 1800
124
+ debug: '[]'
125
+ deepspeed: .nan
126
+ disable_tqdm: false
127
+ dispatch_batches: .nan
128
+ do_eval: true
129
+ do_predict: false
130
+ do_train: true
131
+ epoch: 6.0
132
+ eval_accumulation_steps: 1
133
+ eval_delay: 0
134
+ eval_do_concat_batches: true
135
+ eval_exact_match: 64.42910915934755
136
+ eval_f1: 83.36016013340664
137
+ eval_on_start: false
138
+ eval_runtime: 45.7589
139
+ eval_samples: 3188.0
140
+ eval_samples_per_second: 69.669
141
+ eval_steps: .nan
142
+ eval_steps_per_second: 1.093
143
+ eval_strategy: epoch
144
+ eval_use_gather_object: false
145
+ evaluation_strategy: epoch
146
+ fp16: false
147
+ fp16_backend: auto
148
+ fp16_full_eval: false
149
+ fp16_opt_level: O1
150
+ fsdp: '[]'
151
+ fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
152
+ False}'
153
+ fsdp_min_num_params: 0
154
+ fsdp_transformer_layer_cls_to_wrap: .nan
155
+ full_determinism: false
156
+ gradient_accumulation_steps: 2
157
+ gradient_checkpointing: false
158
+ gradient_checkpointing_kwargs: .nan
159
+ greater_is_better: true
160
+ group_by_length: false
161
+ half_precision_backend: auto
162
+ hub_always_push: false
163
+ hub_model_id: .nan
164
+ hub_private_repo: false
165
+ hub_strategy: every_save
166
+ hub_token: <HUB_TOKEN>
167
+ ignore_data_skip: false
168
+ include_inputs_for_metrics: false
169
+ include_num_input_tokens_seen: false
170
+ include_tokens_per_second: false
171
+ jit_mode_eval: false
172
+ label_names: .nan
173
+ label_smoothing_factor: 0.0
174
+ learning_rate: 3.0e-05
175
+ length_column_name: length
176
+ load_best_model_at_end: true
177
+ local_rank: 0
178
+ log_level: debug
179
+ log_level_replica: warning
180
+ log_on_each_node: true
181
+ logging_dir: /scratch/camembertv2/runs/results/fquad/camembertav2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-2-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-1/logs
182
+ logging_first_step: false
183
+ logging_nan_inf_filter: true
184
+ logging_steps: 100
185
+ logging_strategy: steps
186
+ lr_scheduler_kwargs: '{}'
187
+ lr_scheduler_type: cosine
188
+ max_grad_norm: 1.0
189
+ max_steps: -1
190
+ metric_for_best_model: exact_match
191
+ mp_parameters: .nan
192
+ name: camembertv2/runs/results/fquad/camembertav2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-2-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-cosine-warmup_steps-0
193
+ neftune_noise_alpha: .nan
194
+ no_cuda: false
195
+ num_train_epochs: 6.0
196
+ optim: adamw_torch
197
+ optim_args: .nan
198
+ optim_target_modules: .nan
199
+ output_dir: /scratch/camembertv2/runs/results/fquad/camembertav2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-2-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-1
200
+ overwrite_output_dir: false
201
+ past_index: -1
202
+ per_device_eval_batch_size: 64
203
+ per_device_train_batch_size: 8
204
+ per_gpu_eval_batch_size: .nan
205
+ per_gpu_train_batch_size: .nan
206
+ prediction_loss_only: false
207
+ push_to_hub: false
208
+ push_to_hub_model_id: .nan
209
+ push_to_hub_organization: .nan
210
+ push_to_hub_token: <PUSH_TO_HUB_TOKEN>
211
+ ray_scope: last
212
+ remove_unused_columns: true
213
+ report_to: '[''tensorboard'']'
214
+ restore_callback_states_from_checkpoint: false
215
+ resume_from_checkpoint: .nan
216
+ run_name: camembertav2-base-bf16-p2-17000
217
+ save_on_each_node: false
218
+ save_only_model: false
219
+ save_safetensors: true
220
+ save_steps: 500
221
+ save_strategy: epoch
222
+ save_total_limit: .nan
223
+ seed: 1
224
+ skip_memory_metrics: true
225
+ split_batches: .nan
226
+ tf32: .nan
227
+ torch_compile: true
228
+ torch_compile_backend: inductor
229
+ torch_compile_mode: .nan
230
+ torch_empty_cache_steps: .nan
231
+ torchdynamo: .nan
232
+ total_flos: 2.0394634246921464e+16
233
+ tpu_metrics_debug: false
234
+ tpu_num_cores: .nan
235
+ train_loss: 0.5145930189164087
236
+ train_runtime: 3736.1381
237
+ train_samples: 20731
238
+ train_samples_per_second: 33.293
239
+ train_steps_per_second: 2.081
240
+ use_cpu: false
241
+ use_ipex: false
242
+ use_legacy_prediction_loop: false
243
+ use_mps_device: false
244
+ warmup_ratio: 0.0
245
+ warmup_steps: 0
246
+ weight_decay: 0.0
247
+
248
+ ```
249
+
250
+ #### Results
251
+
252
+ **F1-Score:** 83.36016
253
+
254
+ ## Technical Specifications
255
+
256
+ ### Model Architecture and Objective
257
+
258
+ deberta-v2 for extractive question answering in French.
259
+
260
+ ## Citation
261
+
262
+ **BibTeX:**
263
+
264
+ ```bibtex
265
+ @misc{antoun2024camembert20smarterfrench,
266
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
267
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
268
+ year={2024},
269
+ eprint={2411.08868},
270
+ archivePrefix={arXiv},
271
+ primaryClass={cs.CL},
272
+ url={https://arxiv.org/abs/2411.08868},
273
+ }
274
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_exact_match": 64.42910915934755,
4
+ "eval_f1": 83.36016013340664,
5
+ "eval_runtime": 45.7589,
6
+ "eval_samples": 3188,
7
+ "eval_samples_per_second": 69.669,
8
+ "eval_steps_per_second": 1.093,
9
+ "total_flos": 2.0394634246921464e+16,
10
+ "train_loss": 0.5145930189164087,
11
+ "train_runtime": 3736.1381,
12
+ "train_samples": 20731,
13
+ "train_samples_per_second": 33.293,
14
+ "train_steps_per_second": 2.081
15
+ }
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertav2-base-bf16/post/ckpt-p2-17000/pt/discriminator/",
3
+ "architectures": [
4
+ "DebertaV2ForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "conv_act": "gelu",
9
+ "conv_kernel_size": 0,
10
+ "embedding_size": 768,
11
+ "eos_token_id": 2,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "layer_norm_eps": 1e-07,
18
+ "max_position_embeddings": 1024,
19
+ "max_relative_positions": -1,
20
+ "model_name": "camembertav2-base-bf16",
21
+ "model_type": "deberta-v2",
22
+ "norm_rel_ebd": "layer_norm",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "pooler_dropout": 0,
27
+ "pooler_hidden_act": "gelu",
28
+ "pooler_hidden_size": 768,
29
+ "pos_att_type": [
30
+ "p2c",
31
+ "c2p"
32
+ ],
33
+ "position_biased_input": false,
34
+ "position_buckets": 256,
35
+ "relative_attention": true,
36
+ "share_att_key": true,
37
+ "torch_dtype": "float32",
38
+ "transformers_version": "4.44.2",
39
+ "type_vocab_size": 0,
40
+ "vocab_size": 32768
41
+ }
eval_nbest_predictions.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ef22d30d9113db2a3e121ec66eb96ad133d2c57666af0ca02aa148b9e20860b
3
+ size 14616776
eval_predictions.json ADDED
The diff for this file is too large to render. See raw diff
 
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_exact_match": 64.42910915934755,
4
+ "eval_f1": 83.36016013340664,
5
+ "eval_runtime": 45.7589,
6
+ "eval_samples": 3188,
7
+ "eval_samples_per_second": 69.669,
8
+ "eval_steps_per_second": 1.093
9
+ }
logs/events.out.tfevents.1724425624.nefgpu51.72711.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f16d7346cf66e52238b68ca67654b64e55c5384a67ee75a2fefc509f7bd96a72
3
+ size 24280
logs/events.out.tfevents.1724429414.nefgpu51.72711.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fcae142b47db7158bb33c13bc846063082e5c1e8193efeae1753e0d3c599a3a
3
+ size 364
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a7f916f253e3a2d4b6c6311e7ac4f0137711af98793abeb56a7ed77d6f88419
3
+ size 442496848
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "total_flos": 2.0394634246921464e+16,
4
+ "train_loss": 0.5145930189164087,
5
+ "train_runtime": 3736.1381,
6
+ "train_samples": 20731,
7
+ "train_samples_per_second": 33.293,
8
+ "train_steps_per_second": 2.081
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,635 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 64.42910915934755,
3
+ "best_model_checkpoint": "/scratch/camembertv2/runs/results/fquad/camembertav2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-2-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-1/checkpoint-6480",
4
+ "epoch": 6.0,
5
+ "eval_steps": 500,
6
+ "global_step": 7776,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.07716049382716049,
13
+ "grad_norm": 23.665807723999023,
14
+ "learning_rate": 2.998775977415799e-05,
15
+ "loss": 3.705,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.15432098765432098,
20
+ "grad_norm": 20.01573944091797,
21
+ "learning_rate": 2.9951059073049117e-05,
22
+ "loss": 1.6565,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.23148148148148148,
27
+ "grad_norm": 12.386185646057129,
28
+ "learning_rate": 2.988995779332273e-05,
29
+ "loss": 1.4648,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.30864197530864196,
34
+ "grad_norm": 18.621143341064453,
35
+ "learning_rate": 2.980455565410724e-05,
36
+ "loss": 1.243,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.38580246913580246,
41
+ "grad_norm": 15.412405967712402,
42
+ "learning_rate": 2.96949920342655e-05,
43
+ "loss": 1.1382,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.46296296296296297,
48
+ "grad_norm": 18.231019973754883,
49
+ "learning_rate": 2.95614457449243e-05,
50
+ "loss": 1.1534,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 0.5401234567901234,
55
+ "grad_norm": 16.341306686401367,
56
+ "learning_rate": 2.940413473764923e-05,
57
+ "loss": 1.0392,
58
+ "step": 700
59
+ },
60
+ {
61
+ "epoch": 0.6172839506172839,
62
+ "grad_norm": 12.782474517822266,
63
+ "learning_rate": 2.9223315748741146e-05,
64
+ "loss": 1.0394,
65
+ "step": 800
66
+ },
67
+ {
68
+ "epoch": 0.6944444444444444,
69
+ "grad_norm": 15.871848106384277,
70
+ "learning_rate": 2.9019283880234828e-05,
71
+ "loss": 0.9878,
72
+ "step": 900
73
+ },
74
+ {
75
+ "epoch": 0.7716049382716049,
76
+ "grad_norm": 8.874638557434082,
77
+ "learning_rate": 2.879237211828353e-05,
78
+ "loss": 1.0025,
79
+ "step": 1000
80
+ },
81
+ {
82
+ "epoch": 0.8487654320987654,
83
+ "grad_norm": 19.52781867980957,
84
+ "learning_rate": 2.8542950789715587e-05,
85
+ "loss": 0.9396,
86
+ "step": 1100
87
+ },
88
+ {
89
+ "epoch": 0.9259259259259259,
90
+ "grad_norm": 11.584712982177734,
91
+ "learning_rate": 2.8271426957649866e-05,
92
+ "loss": 0.9625,
93
+ "step": 1200
94
+ },
95
+ {
96
+ "epoch": 1.0,
97
+ "eval_exact_match": 61.91969887076537,
98
+ "eval_f1": 81.01142429829103,
99
+ "eval_runtime": 46.1272,
100
+ "eval_samples_per_second": 69.113,
101
+ "eval_steps_per_second": 1.084,
102
+ "step": 1296
103
+ },
104
+ {
105
+ "epoch": 1.0030864197530864,
106
+ "grad_norm": 7.031412601470947,
107
+ "learning_rate": 2.7978243757156497e-05,
108
+ "loss": 0.9411,
109
+ "step": 1300
110
+ },
111
+ {
112
+ "epoch": 1.0802469135802468,
113
+ "grad_norm": 11.493968963623047,
114
+ "learning_rate": 2.7663879672047095e-05,
115
+ "loss": 0.6781,
116
+ "step": 1400
117
+ },
118
+ {
119
+ "epoch": 1.1574074074074074,
120
+ "grad_norm": 5.261780738830566,
121
+ "learning_rate": 2.732884775397477e-05,
122
+ "loss": 0.6742,
123
+ "step": 1500
124
+ },
125
+ {
126
+ "epoch": 1.2345679012345678,
127
+ "grad_norm": 13.310417175292969,
128
+ "learning_rate": 2.6973694785118392e-05,
129
+ "loss": 0.7004,
130
+ "step": 1600
131
+ },
132
+ {
133
+ "epoch": 1.3117283950617284,
134
+ "grad_norm": 17.390798568725586,
135
+ "learning_rate": 2.65990003858176e-05,
136
+ "loss": 0.6512,
137
+ "step": 1700
138
+ },
139
+ {
140
+ "epoch": 1.3888888888888888,
141
+ "grad_norm": 14.149763107299805,
142
+ "learning_rate": 2.620537606861494e-05,
143
+ "loss": 0.7074,
144
+ "step": 1800
145
+ },
146
+ {
147
+ "epoch": 1.4660493827160495,
148
+ "grad_norm": 5.80026912689209,
149
+ "learning_rate": 2.5793464240249014e-05,
150
+ "loss": 0.6629,
151
+ "step": 1900
152
+ },
153
+ {
154
+ "epoch": 1.5432098765432098,
155
+ "grad_norm": 17.484006881713867,
156
+ "learning_rate": 2.536393715322732e-05,
157
+ "loss": 0.6993,
158
+ "step": 2000
159
+ },
160
+ {
161
+ "epoch": 1.6203703703703702,
162
+ "grad_norm": 21.80438804626465,
163
+ "learning_rate": 2.49174958086899e-05,
164
+ "loss": 0.6408,
165
+ "step": 2100
166
+ },
167
+ {
168
+ "epoch": 1.6975308641975309,
169
+ "grad_norm": 19.416994094848633,
170
+ "learning_rate": 2.4454868812354406e-05,
171
+ "loss": 0.6592,
172
+ "step": 2200
173
+ },
174
+ {
175
+ "epoch": 1.7746913580246915,
176
+ "grad_norm": 5.628990650177002,
177
+ "learning_rate": 2.3976811185409607e-05,
178
+ "loss": 0.622,
179
+ "step": 2300
180
+ },
181
+ {
182
+ "epoch": 1.8518518518518519,
183
+ "grad_norm": 14.69774341583252,
184
+ "learning_rate": 2.3484103132298082e-05,
185
+ "loss": 0.645,
186
+ "step": 2400
187
+ },
188
+ {
189
+ "epoch": 1.9290123456790123,
190
+ "grad_norm": 13.895576477050781,
191
+ "learning_rate": 2.297754876739905e-05,
192
+ "loss": 0.6746,
193
+ "step": 2500
194
+ },
195
+ {
196
+ "epoch": 2.0,
197
+ "eval_exact_match": 63.26850690087829,
198
+ "eval_f1": 82.69346269167056,
199
+ "eval_runtime": 45.8773,
200
+ "eval_samples_per_second": 69.49,
201
+ "eval_steps_per_second": 1.09,
202
+ "step": 2592
203
+ },
204
+ {
205
+ "epoch": 2.006172839506173,
206
+ "grad_norm": 7.386751651763916,
207
+ "learning_rate": 2.2457974802689542e-05,
208
+ "loss": 0.6472,
209
+ "step": 2600
210
+ },
211
+ {
212
+ "epoch": 2.0833333333333335,
213
+ "grad_norm": 12.677722930908203,
214
+ "learning_rate": 2.192622919852551e-05,
215
+ "loss": 0.4365,
216
+ "step": 2700
217
+ },
218
+ {
219
+ "epoch": 2.1604938271604937,
220
+ "grad_norm": 7.171934127807617,
221
+ "learning_rate": 2.138317977974501e-05,
222
+ "loss": 0.4287,
223
+ "step": 2800
224
+ },
225
+ {
226
+ "epoch": 2.2376543209876543,
227
+ "grad_norm": 10.016780853271484,
228
+ "learning_rate": 2.082971281935195e-05,
229
+ "loss": 0.4462,
230
+ "step": 2900
231
+ },
232
+ {
233
+ "epoch": 2.314814814814815,
234
+ "grad_norm": 21.301910400390625,
235
+ "learning_rate": 2.0266731592091834e-05,
236
+ "loss": 0.4425,
237
+ "step": 3000
238
+ },
239
+ {
240
+ "epoch": 2.3919753086419755,
241
+ "grad_norm": 21.78326988220215,
242
+ "learning_rate": 1.969515490028019e-05,
243
+ "loss": 0.425,
244
+ "step": 3100
245
+ },
246
+ {
247
+ "epoch": 2.4691358024691357,
248
+ "grad_norm": 17.772539138793945,
249
+ "learning_rate": 1.9115915574289523e-05,
250
+ "loss": 0.4181,
251
+ "step": 3200
252
+ },
253
+ {
254
+ "epoch": 2.5462962962962963,
255
+ "grad_norm": 7.547439098358154,
256
+ "learning_rate": 1.8529958950142064e-05,
257
+ "loss": 0.4233,
258
+ "step": 3300
259
+ },
260
+ {
261
+ "epoch": 2.623456790123457,
262
+ "grad_norm": 9.031538963317871,
263
+ "learning_rate": 1.7938241326692906e-05,
264
+ "loss": 0.4691,
265
+ "step": 3400
266
+ },
267
+ {
268
+ "epoch": 2.700617283950617,
269
+ "grad_norm": 9.722735404968262,
270
+ "learning_rate": 1.734172840492147e-05,
271
+ "loss": 0.4498,
272
+ "step": 3500
273
+ },
274
+ {
275
+ "epoch": 2.7777777777777777,
276
+ "grad_norm": 9.985281944274902,
277
+ "learning_rate": 1.6741393711878455e-05,
278
+ "loss": 0.4388,
279
+ "step": 3600
280
+ },
281
+ {
282
+ "epoch": 2.8549382716049383,
283
+ "grad_norm": 9.514204978942871,
284
+ "learning_rate": 1.6138217011860335e-05,
285
+ "loss": 0.4501,
286
+ "step": 3700
287
+ },
288
+ {
289
+ "epoch": 2.932098765432099,
290
+ "grad_norm": 16.88687515258789,
291
+ "learning_rate": 1.5533182707404563e-05,
292
+ "loss": 0.4172,
293
+ "step": 3800
294
+ },
295
+ {
296
+ "epoch": 3.0,
297
+ "eval_exact_match": 63.676286072772896,
298
+ "eval_f1": 82.60726439387956,
299
+ "eval_runtime": 45.9653,
300
+ "eval_samples_per_second": 69.357,
301
+ "eval_steps_per_second": 1.088,
302
+ "step": 3888
303
+ },
304
+ {
305
+ "epoch": 3.009259259259259,
306
+ "grad_norm": 4.132925033569336,
307
+ "learning_rate": 1.4927278232714974e-05,
308
+ "loss": 0.3689,
309
+ "step": 3900
310
+ },
311
+ {
312
+ "epoch": 3.0864197530864197,
313
+ "grad_norm": 9.779620170593262,
314
+ "learning_rate": 1.4321492442139406e-05,
315
+ "loss": 0.2905,
316
+ "step": 4000
317
+ },
318
+ {
319
+ "epoch": 3.1635802469135803,
320
+ "grad_norm": 7.350837230682373,
321
+ "learning_rate": 1.371681399632967e-05,
322
+ "loss": 0.2937,
323
+ "step": 4100
324
+ },
325
+ {
326
+ "epoch": 3.240740740740741,
327
+ "grad_norm": 6.923620223999023,
328
+ "learning_rate": 1.3114229748717562e-05,
329
+ "loss": 0.2922,
330
+ "step": 4200
331
+ },
332
+ {
333
+ "epoch": 3.317901234567901,
334
+ "grad_norm": 16.84642791748047,
335
+ "learning_rate": 1.2514723134940363e-05,
336
+ "loss": 0.28,
337
+ "step": 4300
338
+ },
339
+ {
340
+ "epoch": 3.3950617283950617,
341
+ "grad_norm": 22.180021286010742,
342
+ "learning_rate": 1.191927256784427e-05,
343
+ "loss": 0.2907,
344
+ "step": 4400
345
+ },
346
+ {
347
+ "epoch": 3.4722222222222223,
348
+ "grad_norm": 2.5661354064941406,
349
+ "learning_rate": 1.1328849840685143e-05,
350
+ "loss": 0.2806,
351
+ "step": 4500
352
+ },
353
+ {
354
+ "epoch": 3.549382716049383,
355
+ "grad_norm": 11.584675788879395,
356
+ "learning_rate": 1.0744418541132676e-05,
357
+ "loss": 0.2963,
358
+ "step": 4600
359
+ },
360
+ {
361
+ "epoch": 3.626543209876543,
362
+ "grad_norm": 5.476423740386963,
363
+ "learning_rate": 1.0166932478666293e-05,
364
+ "loss": 0.3199,
365
+ "step": 4700
366
+ },
367
+ {
368
+ "epoch": 3.7037037037037037,
369
+ "grad_norm": 9.405366897583008,
370
+ "learning_rate": 9.597334127929346e-06,
371
+ "loss": 0.3107,
372
+ "step": 4800
373
+ },
374
+ {
375
+ "epoch": 3.7808641975308643,
376
+ "grad_norm": 8.880900382995605,
377
+ "learning_rate": 9.036553090582144e-06,
378
+ "loss": 0.2991,
379
+ "step": 4900
380
+ },
381
+ {
382
+ "epoch": 3.8580246913580245,
383
+ "grad_norm": 3.8892629146575928,
384
+ "learning_rate": 8.485504578164017e-06,
385
+ "loss": 0.2716,
386
+ "step": 5000
387
+ },
388
+ {
389
+ "epoch": 3.935185185185185,
390
+ "grad_norm": 5.704967498779297,
391
+ "learning_rate": 7.945087918440563e-06,
392
+ "loss": 0.2688,
393
+ "step": 5100
394
+ },
395
+ {
396
+ "epoch": 4.0,
397
+ "eval_exact_match": 64.2409033877039,
398
+ "eval_f1": 83.135484930466,
399
+ "eval_runtime": 45.9298,
400
+ "eval_samples_per_second": 69.41,
401
+ "eval_steps_per_second": 1.089,
402
+ "step": 5184
403
+ },
404
+ {
405
+ "epoch": 4.012345679012346,
406
+ "grad_norm": 16.147579193115234,
407
+ "learning_rate": 7.416185087673616e-06,
408
+ "loss": 0.2919,
409
+ "step": 5200
410
+ },
411
+ {
412
+ "epoch": 4.089506172839506,
413
+ "grad_norm": 13.380005836486816,
414
+ "learning_rate": 6.899659271209459e-06,
415
+ "loss": 0.2068,
416
+ "step": 5300
417
+ },
418
+ {
419
+ "epoch": 4.166666666666667,
420
+ "grad_norm": 9.491084098815918,
421
+ "learning_rate": 6.3963534547343126e-06,
422
+ "loss": 0.2009,
423
+ "step": 5400
424
+ },
425
+ {
426
+ "epoch": 4.243827160493828,
427
+ "grad_norm": 14.11040210723877,
428
+ "learning_rate": 5.907089048496351e-06,
429
+ "loss": 0.2124,
430
+ "step": 5500
431
+ },
432
+ {
433
+ "epoch": 4.320987654320987,
434
+ "grad_norm": 12.674304962158203,
435
+ "learning_rate": 5.4326645467394085e-06,
436
+ "loss": 0.2173,
437
+ "step": 5600
438
+ },
439
+ {
440
+ "epoch": 4.398148148148148,
441
+ "grad_norm": 5.682621955871582,
442
+ "learning_rate": 4.973854224536363e-06,
443
+ "loss": 0.213,
444
+ "step": 5700
445
+ },
446
+ {
447
+ "epoch": 4.4753086419753085,
448
+ "grad_norm": 5.133475303649902,
449
+ "learning_rate": 4.5314068741488615e-06,
450
+ "loss": 0.2,
451
+ "step": 5800
452
+ },
453
+ {
454
+ "epoch": 4.552469135802469,
455
+ "grad_norm": 6.370384693145752,
456
+ "learning_rate": 4.1060445829758305e-06,
457
+ "loss": 0.197,
458
+ "step": 5900
459
+ },
460
+ {
461
+ "epoch": 4.62962962962963,
462
+ "grad_norm": 16.37765884399414,
463
+ "learning_rate": 3.6984615550850894e-06,
464
+ "loss": 0.2051,
465
+ "step": 6000
466
+ },
467
+ {
468
+ "epoch": 4.70679012345679,
469
+ "grad_norm": 11.54761791229248,
470
+ "learning_rate": 3.3093229782514023e-06,
471
+ "loss": 0.1733,
472
+ "step": 6100
473
+ },
474
+ {
475
+ "epoch": 4.783950617283951,
476
+ "grad_norm": 22.175281524658203,
477
+ "learning_rate": 2.939263938350012e-06,
478
+ "loss": 0.2003,
479
+ "step": 6200
480
+ },
481
+ {
482
+ "epoch": 4.861111111111111,
483
+ "grad_norm": 1.2753137350082397,
484
+ "learning_rate": 2.588888382877342e-06,
485
+ "loss": 0.194,
486
+ "step": 6300
487
+ },
488
+ {
489
+ "epoch": 4.938271604938271,
490
+ "grad_norm": 32.319236755371094,
491
+ "learning_rate": 2.2587681352905404e-06,
492
+ "loss": 0.2149,
493
+ "step": 6400
494
+ },
495
+ {
496
+ "epoch": 5.0,
497
+ "eval_exact_match": 64.42910915934755,
498
+ "eval_f1": 83.36016013340664,
499
+ "eval_runtime": 45.8927,
500
+ "eval_samples_per_second": 69.466,
501
+ "eval_steps_per_second": 1.089,
502
+ "step": 6480
503
+ },
504
+ {
505
+ "epoch": 5.015432098765432,
506
+ "grad_norm": 9.053484916687012,
507
+ "learning_rate": 1.9494419617743312e-06,
508
+ "loss": 0.198,
509
+ "step": 6500
510
+ },
511
+ {
512
+ "epoch": 5.092592592592593,
513
+ "grad_norm": 11.60733699798584,
514
+ "learning_rate": 1.6614146919584094e-06,
515
+ "loss": 0.1512,
516
+ "step": 6600
517
+ },
518
+ {
519
+ "epoch": 5.169753086419753,
520
+ "grad_norm": 9.327279090881348,
521
+ "learning_rate": 1.3951563950202656e-06,
522
+ "loss": 0.167,
523
+ "step": 6700
524
+ },
525
+ {
526
+ "epoch": 5.246913580246914,
527
+ "grad_norm": 4.551391124725342,
528
+ "learning_rate": 1.1511016125181445e-06,
529
+ "loss": 0.1315,
530
+ "step": 6800
531
+ },
532
+ {
533
+ "epoch": 5.324074074074074,
534
+ "grad_norm": 4.2411274909973145,
535
+ "learning_rate": 9.296486492061334e-07,
536
+ "loss": 0.1532,
537
+ "step": 6900
538
+ },
539
+ {
540
+ "epoch": 5.401234567901234,
541
+ "grad_norm": 5.1193461418151855,
542
+ "learning_rate": 7.311589229888083e-07,
543
+ "loss": 0.1624,
544
+ "step": 7000
545
+ },
546
+ {
547
+ "epoch": 5.478395061728395,
548
+ "grad_norm": 1.735378384590149,
549
+ "learning_rate": 5.55956375076332e-07,
550
+ "loss": 0.1688,
551
+ "step": 7100
552
+ },
553
+ {
554
+ "epoch": 5.555555555555555,
555
+ "grad_norm": 3.7359869480133057,
556
+ "learning_rate": 4.043269413026429e-07,
557
+ "loss": 0.148,
558
+ "step": 7200
559
+ },
560
+ {
561
+ "epoch": 5.632716049382716,
562
+ "grad_norm": 6.915912628173828,
563
+ "learning_rate": 2.7651808546956646e-07,
564
+ "loss": 0.1822,
565
+ "step": 7300
566
+ },
567
+ {
568
+ "epoch": 5.709876543209877,
569
+ "grad_norm": 5.2128448486328125,
570
+ "learning_rate": 1.727383954784373e-07,
571
+ "loss": 0.163,
572
+ "step": 7400
573
+ },
574
+ {
575
+ "epoch": 5.787037037037037,
576
+ "grad_norm": 3.4656639099121094,
577
+ "learning_rate": 9.315724290836047e-08,
578
+ "loss": 0.1716,
579
+ "step": 7500
580
+ },
581
+ {
582
+ "epoch": 5.864197530864198,
583
+ "grad_norm": 21.396251678466797,
584
+ "learning_rate": 3.790450659670097e-08,
585
+ "loss": 0.1694,
586
+ "step": 7600
587
+ },
588
+ {
589
+ "epoch": 5.9413580246913575,
590
+ "grad_norm": 14.200843811035156,
591
+ "learning_rate": 7.070360672907228e-09,
592
+ "loss": 0.1618,
593
+ "step": 7700
594
+ },
595
+ {
596
+ "epoch": 6.0,
597
+ "eval_exact_match": 64.0840652446675,
598
+ "eval_f1": 83.12314115625247,
599
+ "eval_runtime": 45.7565,
600
+ "eval_samples_per_second": 69.673,
601
+ "eval_steps_per_second": 1.093,
602
+ "step": 7776
603
+ },
604
+ {
605
+ "epoch": 6.0,
606
+ "step": 7776,
607
+ "total_flos": 2.0394634246921464e+16,
608
+ "train_loss": 0.5145930189164087,
609
+ "train_runtime": 3736.1381,
610
+ "train_samples_per_second": 33.293,
611
+ "train_steps_per_second": 2.081
612
+ }
613
+ ],
614
+ "logging_steps": 100,
615
+ "max_steps": 7776,
616
+ "num_input_tokens_seen": 0,
617
+ "num_train_epochs": 6,
618
+ "save_steps": 500,
619
+ "stateful_callbacks": {
620
+ "TrainerControl": {
621
+ "args": {
622
+ "should_epoch_stop": false,
623
+ "should_evaluate": false,
624
+ "should_log": false,
625
+ "should_save": true,
626
+ "should_training_stop": true
627
+ },
628
+ "attributes": {}
629
+ }
630
+ },
631
+ "total_flos": 2.0394634246921464e+16,
632
+ "train_batch_size": 8,
633
+ "trial_name": null,
634
+ "trial_params": null
635
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b9787a37b5a87abea534c694a63b96bc13b2ab599a8e1ee426033703ba558db
3
+ size 5688
vocab.txt ADDED
The diff for this file is too large to render. See raw diff