Pubudu commited on
Commit
cab3af5
·
verified ·
1 Parent(s): 6e3dd8d

Model save

Browse files
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: facebook/mbart-large-50
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: dataset-1700
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # dataset-1700
15
+
16
+ This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on an unknown dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 3.2549
19
+ - Gen Len: 16.3667
20
+ - Rouge-1: 36.0238
21
+ - Rouge-2: 18.9307
22
+ - Rouge-l: 35.0228
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 0.0001
42
+ - train_batch_size: 8
43
+ - eval_batch_size: 8
44
+ - seed: 42
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: polynomial
47
+ - lr_scheduler_warmup_steps: 1000
48
+ - num_epochs: 50
49
+ - label_smoothing_factor: 0.1
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Gen Len | Rouge-1 | Rouge-2 | Rouge-l |
54
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:-------:|
55
+ | No log | 1.0 | 214 | 3.8357 | 37.1467 | 26.95 | 11.9204 | 25.5728 |
56
+ | No log | 2.0 | 428 | 3.4294 | 32.64 | 28.5045 | 14.5187 | 26.6976 |
57
+ | No log | 3.0 | 642 | 3.3246 | 17.4933 | 27.9551 | 14.0601 | 26.9049 |
58
+ | No log | 4.0 | 856 | 3.2771 | 15.4 | 28.2521 | 13.6616 | 27.9303 |
59
+ | No log | 5.0 | 1070 | 3.1305 | 20.2333 | 34.3539 | 18.0221 | 33.5067 |
60
+ | No log | 6.0 | 1284 | 3.0782 | 16.9267 | 32.743 | 16.171 | 32.2637 |
61
+ | No log | 7.0 | 1498 | 3.0556 | 17.1 | 33.9666 | 17.3623 | 33.5188 |
62
+ | No log | 8.0 | 1712 | 3.0948 | 16.1067 | 35.7842 | 19.0957 | 35.1125 |
63
+ | No log | 9.0 | 1926 | 3.1146 | 16.3133 | 33.9124 | 18.8415 | 33.2149 |
64
+ | No log | 10.0 | 2140 | 3.1464 | 15.8467 | 35.1778 | 18.874 | 34.3936 |
65
+ | No log | 11.0 | 2354 | 3.1760 | 16.4467 | 35.6329 | 19.0674 | 34.9167 |
66
+ | No log | 12.0 | 2568 | 3.2549 | 16.3667 | 36.0238 | 18.9307 | 35.0228 |
67
+
68
+
69
+ ### Framework versions
70
+
71
+ - Transformers 4.35.2
72
+ - Pytorch 2.2.1+cu121
73
+ - Datasets 2.19.1
74
+ - Tokenizers 0.15.2
mbart-large-50_par_bn_rf_16_dinamina_1700/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": {
3
+ "adapter_residual_before_ln": false,
4
+ "cross_adapter": false,
5
+ "factorized_phm_W": true,
6
+ "factorized_phm_rule": false,
7
+ "hypercomplex_nonlinearity": "glorot-uniform",
8
+ "init_weights": "mam_adapter",
9
+ "inv_adapter": null,
10
+ "inv_adapter_reduction_factor": null,
11
+ "is_parallel": true,
12
+ "learn_phm": true,
13
+ "leave_out": [],
14
+ "ln_after": false,
15
+ "ln_before": false,
16
+ "mh_adapter": false,
17
+ "non_linearity": "relu",
18
+ "original_ln_after": true,
19
+ "original_ln_before": false,
20
+ "output_adapter": true,
21
+ "phm_bias": true,
22
+ "phm_c_init": "normal",
23
+ "phm_dim": 4,
24
+ "phm_init_range": 0.0001,
25
+ "phm_layer": false,
26
+ "phm_rank": 1,
27
+ "reduction_factor": 16,
28
+ "residual_before_ln": true,
29
+ "scaling": 4.0,
30
+ "shared_W_phm": false,
31
+ "shared_phm_rule": true,
32
+ "use_gating": false
33
+ },
34
+ "config_id": "df0cbc77437caacb",
35
+ "hidden_size": 1024,
36
+ "model_class": "MBartForConditionalGeneration",
37
+ "model_name": "facebook/mbart-large-50",
38
+ "model_type": "mbart",
39
+ "name": "mbart-large-50_par_bn_rf_16_dinamina_1700",
40
+ "version": "0.1.1"
41
+ }
mbart-large-50_par_bn_rf_16_dinamina_1700/head_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": null,
3
+ "hidden_size": 1024,
4
+ "label2id": {
5
+ "LABEL_0": 0,
6
+ "LABEL_1": 1,
7
+ "LABEL_2": 2
8
+ },
9
+ "model_class": "MBartForConditionalGeneration",
10
+ "model_name": "facebook/mbart-large-50",
11
+ "model_type": "mbart",
12
+ "name": null,
13
+ "num_labels": 3,
14
+ "version": "0.1.1"
15
+ }
mbart-large-50_par_bn_rf_16_dinamina_1700/pytorch_adapter.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02ada8aba314d077ab2a85048002093fa9b20eabed0e7e3b790b797c4e56b37d
3
+ size 12726422
mbart-large-50_par_bn_rf_16_dinamina_1700/pytorch_model_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adcc43a196d39e07adb0d2ce5a89621565caf0187b28c81222c9bb08b8fa37f7
3
+ size 1025227034