Add new SentenceTransformer model.
Browse files- README.md +34 -31
- model.safetensors +1 -1
README.md
CHANGED
@@ -420,13 +420,19 @@ You can finetune this model on your own dataset.
|
|
420 |
```
|
421 |
|
422 |
### Training Hyperparameters
|
|
|
|
|
|
|
|
|
|
|
|
|
423 |
|
424 |
#### All Hyperparameters
|
425 |
<details><summary>Click to expand</summary>
|
426 |
|
427 |
-
- `overwrite_output_dir`:
|
428 |
- `do_predict`: False
|
429 |
-
- `eval_strategy`:
|
430 |
- `prediction_loss_only`: True
|
431 |
- `per_device_train_batch_size`: 8
|
432 |
- `per_device_eval_batch_size`: 8
|
@@ -441,7 +447,7 @@ You can finetune this model on your own dataset.
|
|
441 |
- `adam_beta2`: 0.999
|
442 |
- `adam_epsilon`: 1e-08
|
443 |
- `max_grad_norm`: 1.0
|
444 |
-
- `num_train_epochs`:
|
445 |
- `max_steps`: -1
|
446 |
- `lr_scheduler_type`: linear
|
447 |
- `lr_scheduler_kwargs`: {}
|
@@ -481,7 +487,7 @@ You can finetune this model on your own dataset.
|
|
481 |
- `disable_tqdm`: False
|
482 |
- `remove_unused_columns`: True
|
483 |
- `label_names`: None
|
484 |
-
- `load_best_model_at_end`:
|
485 |
- `ignore_data_skip`: False
|
486 |
- `fsdp`: []
|
487 |
- `fsdp_min_num_params`: 0
|
@@ -540,33 +546,30 @@ You can finetune this model on your own dataset.
|
|
540 |
</details>
|
541 |
|
542 |
### Training Logs
|
543 |
-
| Epoch
|
544 |
-
|
545 |
-
| 0.
|
546 |
-
| 0.
|
547 |
-
| 0.
|
548 |
-
| 0.
|
549 |
-
| 0.
|
550 |
-
| 0.
|
551 |
-
| 0.
|
552 |
-
| 0.
|
553 |
-
|
|
554 |
-
|
|
555 |
-
|
|
556 |
-
|
|
557 |
-
|
|
558 |
-
|
|
559 |
-
|
|
560 |
-
|
|
561 |
-
|
|
562 |
-
|
|
563 |
-
|
|
564 |
-
|
|
565 |
-
|
566 |
-
|
567 |
-
| 2.7981 | 11500 | 1.0904 |
|
568 |
-
| 2.9197 | 12000 | 0.9434 |
|
569 |
-
|
570 |
|
571 |
### Framework Versions
|
572 |
- Python: 3.11.9
|
|
|
420 |
```
|
421 |
|
422 |
### Training Hyperparameters
|
423 |
+
#### Non-Default Hyperparameters
|
424 |
+
|
425 |
+
- `overwrite_output_dir`: True
|
426 |
+
- `eval_strategy`: steps
|
427 |
+
- `num_train_epochs`: 5
|
428 |
+
- `load_best_model_at_end`: True
|
429 |
|
430 |
#### All Hyperparameters
|
431 |
<details><summary>Click to expand</summary>
|
432 |
|
433 |
+
- `overwrite_output_dir`: True
|
434 |
- `do_predict`: False
|
435 |
+
- `eval_strategy`: steps
|
436 |
- `prediction_loss_only`: True
|
437 |
- `per_device_train_batch_size`: 8
|
438 |
- `per_device_eval_batch_size`: 8
|
|
|
447 |
- `adam_beta2`: 0.999
|
448 |
- `adam_epsilon`: 1e-08
|
449 |
- `max_grad_norm`: 1.0
|
450 |
+
- `num_train_epochs`: 5
|
451 |
- `max_steps`: -1
|
452 |
- `lr_scheduler_type`: linear
|
453 |
- `lr_scheduler_kwargs`: {}
|
|
|
487 |
- `disable_tqdm`: False
|
488 |
- `remove_unused_columns`: True
|
489 |
- `label_names`: None
|
490 |
+
- `load_best_model_at_end`: True
|
491 |
- `ignore_data_skip`: False
|
492 |
- `fsdp`: []
|
493 |
- `fsdp_min_num_params`: 0
|
|
|
546 |
</details>
|
547 |
|
548 |
### Training Logs
|
549 |
+
| Epoch | Step | Training Loss | all-nli-triplet loss | stsb loss | natural-questions loss | quora loss |
|
550 |
+
|:----------:|:--------:|:-------------:|:--------------------:|:----------:|:----------------------:|:----------:|
|
551 |
+
| 0.0487 | 200 | 2.0928 | - | - | - | - |
|
552 |
+
| 0.0973 | 400 | 2.2013 | - | - | - | - |
|
553 |
+
| 0.1460 | 600 | 1.7404 | - | - | - | - |
|
554 |
+
| 0.1946 | 800 | 1.9134 | - | - | - | - |
|
555 |
+
| **0.2433** | **1000** | **2.043** | **0.5161** | **6.2815** | **0.1172** | **0.0192** |
|
556 |
+
| 0.2920 | 1200 | 1.8817 | - | - | - | - |
|
557 |
+
| 0.3406 | 1400 | 1.7734 | - | - | - | - |
|
558 |
+
| 0.3893 | 1600 | 1.5935 | - | - | - | - |
|
559 |
+
| 0.4380 | 1800 | 1.6762 | - | - | - | - |
|
560 |
+
| 0.4866 | 2000 | 1.7031 | 0.4555 | 6.3907 | 0.0726 | 0.0198 |
|
561 |
+
| 0.5353 | 2200 | 1.8561 | - | - | - | - |
|
562 |
+
| 0.5839 | 2400 | 1.6742 | - | - | - | - |
|
563 |
+
| 0.6326 | 2600 | 1.456 | - | - | - | - |
|
564 |
+
| 0.6813 | 2800 | 1.6122 | - | - | - | - |
|
565 |
+
| 0.7299 | 3000 | 1.8851 | 0.4975 | 6.1758 | 0.0841 | 0.0208 |
|
566 |
+
| 0.7786 | 3200 | 1.5684 | - | - | - | - |
|
567 |
+
| 0.8273 | 3400 | 1.6535 | - | - | - | - |
|
568 |
+
| 0.8759 | 3600 | 1.5043 | - | - | - | - |
|
569 |
+
| 0.9246 | 3800 | 1.4768 | - | - | - | - |
|
570 |
+
| 0.9732 | 4000 | 1.686 | 0.4912 | 6.1600 | 0.0795 | 0.0170 |
|
571 |
+
|
572 |
+
* The bold row denotes the saved checkpoint.
|
|
|
|
|
|
|
573 |
|
574 |
### Framework Versions
|
575 |
- Python: 3.11.9
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 594668880
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bbc5e57f3e543b2aa7f15a158a3a5bb351bb99a79235706212199447b9614a3e
|
3 |
size 594668880
|