SentenceTransformer based on neuralmind/bert-large-portuguese-cased

This is a sentence-transformers model finetuned from neuralmind/bert-large-portuguese-cased. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: neuralmind/bert-large-portuguese-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SenhorDasMoscas/acho2-ptbr-e4-lr3e-05")
# Run inference
sentences = [
    'livro ficcao',
    'produto basico arroz feijao massa item mercearia snack alimento congelar dia dia situacoes emergencial',
    'produto voltar publico adulto brinquedo sexual jogo adulto',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9024
spearman_cosine 0.8404

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,822 training samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 7.16 tokens
    • max: 15 tokens
    • min: 11 tokens
    • mean: 25.08 tokens
    • max: 36 tokens
    • min: 0.1
    • mean: 0.53
    • max: 1.0
  • Samples:
    text1 text2 label
    tenis nike artigo esportivo bola raquete acessorio academia roupa esportiva equipamento esporte outdoor escalada ciclismo 1.0
    tapete Sao Carlos tinta cimento ferramenta construcao material reforma piso azulejo equipamento protecao individual 0.1
    kit sensual lua Mel produto voltar publico adulto brinquedo sexual jogo adulto 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,203 evaluation samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 7.09 tokens
    • max: 14 tokens
    • min: 11 tokens
    • mean: 25.62 tokens
    • max: 36 tokens
    • min: 0.1
    • mean: 0.57
    • max: 1.0
  • Samples:
    text1 text2 label
    carvao tinta cimento ferramenta construcao material reforma piso azulejo equipamento protecao individual 1.0
    telha fibrocimento produto basico arroz feijao massa item mercearia snack alimento congelar dia dia situacoes emergencial 0.1
    racao cachorro pedigree loja decoracao baloe paineis decorativo item tematico casamento aniversario luminaria bandeirola vela acessorio transformar ambiente festa ocasioes especial 0.1
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 3e-05
  • weight_decay: 0.1
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • warmup_steps: 135
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.1
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 135
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval-similarity_spearman_cosine
0.0147 5 0.2323 - -
0.0295 10 0.2056 - -
0.0442 15 0.2203 - -
0.0590 20 0.1947 - -
0.0737 25 0.1811 - -
0.0885 30 0.1526 - -
0.1032 35 0.1511 - -
0.1180 40 0.1543 - -
0.1327 45 0.1529 - -
0.1475 50 0.1296 - -
0.1622 55 0.1212 - -
0.1770 60 0.1023 - -
0.1917 65 0.1011 - -
0.2065 70 0.1047 - -
0.2212 75 0.1077 - -
0.2360 80 0.0909 - -
0.2507 85 0.0913 - -
0.2655 90 0.1045 - -
0.2802 95 0.0761 - -
0.2950 100 0.0705 - -
0.3097 105 0.086 - -
0.3245 110 0.0753 - -
0.3392 115 0.0652 - -
0.3540 120 0.0663 - -
0.3687 125 0.0862 - -
0.3835 130 0.085 - -
0.3982 135 0.0803 - -
0.4130 140 0.088 - -
0.4277 145 0.0569 - -
0.4425 150 0.0689 - -
0.4572 155 0.0746 - -
0.4720 160 0.069 - -
0.4867 165 0.0665 - -
0.5015 170 0.0778 - -
0.5162 175 0.0513 - -
0.5310 180 0.0525 - -
0.5457 185 0.0817 - -
0.5605 190 0.0731 - -
0.5752 195 0.0704 - -
0.5900 200 0.0742 0.0651 0.8003
0.6047 205 0.0722 - -
0.6195 210 0.0894 - -
0.6342 215 0.0679 - -
0.6490 220 0.0532 - -
0.6637 225 0.0877 - -
0.6785 230 0.2859 - -
0.6932 235 0.3122 - -
0.7080 240 0.1166 - -
0.7227 245 0.0785 - -
0.7375 250 0.0636 - -
0.7522 255 0.0613 - -
0.7670 260 0.0648 - -
0.7817 265 0.0597 - -
0.7965 270 0.0597 - -
0.8112 275 0.0662 - -
0.8260 280 0.0581 - -
0.8407 285 0.0685 - -
0.8555 290 0.0629 - -
0.8702 295 0.0694 - -
0.8850 300 0.055 - -
0.8997 305 0.0647 - -
0.9145 310 0.0634 - -
0.9292 315 0.0724 - -
0.9440 320 0.0658 - -
0.9587 325 0.0594 - -
0.9735 330 0.053 - -
0.9882 335 0.0622 - -
1.0029 340 0.0622 - -
1.0177 345 0.0593 - -
1.0324 350 0.0541 - -
1.0472 355 0.0493 - -
1.0619 360 0.0504 - -
1.0767 365 0.0539 - -
1.0914 370 0.0439 - -
1.1062 375 0.0613 - -
1.1209 380 0.0432 - -
1.1357 385 0.0617 - -
1.1504 390 0.0546 - -
1.1652 395 0.0427 - -
1.1799 400 0.0674 0.0488 0.8279
1.1947 405 0.055 - -
1.2094 410 0.0393 - -
1.2242 415 0.0561 - -
1.2389 420 0.0531 - -
1.2537 425 0.0374 - -
1.2684 430 0.0374 - -
1.2832 435 0.0369 - -
1.2979 440 0.0408 - -
1.3127 445 0.0508 - -
1.3274 450 0.0558 - -
1.3422 455 0.0566 - -
1.3569 460 0.0466 - -
1.3717 465 0.0363 - -
1.3864 470 0.0489 - -
1.4012 475 0.0535 - -
1.4159 480 0.0502 - -
1.4307 485 0.0429 - -
1.4454 490 0.0541 - -
1.4602 495 0.057 - -
1.4749 500 0.0402 - -
1.4897 505 0.0464 - -
1.5044 510 0.0405 - -
1.5192 515 0.0469 - -
1.5339 520 0.0519 - -
1.5487 525 0.0338 - -
1.5634 530 0.0476 - -
1.5782 535 0.0385 - -
1.5929 540 0.0442 - -
1.6077 545 0.0379 - -
1.6224 550 0.0477 - -
1.6372 555 0.0525 - -
1.6519 560 0.0487 - -
1.6667 565 0.0499 - -
1.6814 570 0.0344 - -
1.6962 575 0.0503 - -
1.7109 580 0.0568 - -
1.7257 585 0.0465 - -
1.7404 590 0.0325 - -
1.7552 595 0.0479 - -
1.7699 600 0.046 0.0466 0.8309
1.7847 605 0.0482 - -
1.7994 610 0.0546 - -
1.8142 615 0.0465 - -
1.8289 620 0.049 - -
1.8437 625 0.0422 - -
1.8584 630 0.0358 - -
1.8732 635 0.0519 - -
1.8879 640 0.0416 - -
1.9027 645 0.0344 - -
1.9174 650 0.0339 - -
1.9322 655 0.0365 - -
1.9469 660 0.038 - -
1.9617 665 0.0417 - -
1.9764 670 0.0521 - -
1.9912 675 0.0242 - -
2.0059 680 0.0405 - -
2.0206 685 0.0233 - -
2.0354 690 0.0299 - -
2.0501 695 0.0194 - -
2.0649 700 0.0424 - -
2.0796 705 0.0245 - -
2.0944 710 0.0374 - -
2.1091 715 0.0295 - -
2.1239 720 0.0236 - -
2.1386 725 0.0477 - -
2.1534 730 0.0211 - -
2.1681 735 0.0306 - -
2.1829 740 0.0265 - -
2.1976 745 0.0398 - -
2.2124 750 0.0468 - -
2.2271 755 0.0252 - -
2.2419 760 0.0329 - -
2.2566 765 0.0317 - -
2.2714 770 0.035 - -
2.2861 775 0.0387 - -
2.3009 780 0.037 - -
2.3156 785 0.0285 - -
2.3304 790 0.0377 - -
2.3451 795 0.0344 - -
2.3599 800 0.0335 0.0431 0.8360
2.3746 805 0.0296 - -
2.3894 810 0.0357 - -
2.4041 815 0.0244 - -
2.4189 820 0.0373 - -
2.4336 825 0.0295 - -
2.4484 830 0.0353 - -
2.4631 835 0.0303 - -
2.4779 840 0.0206 - -
2.4926 845 0.0284 - -
2.5074 850 0.0293 - -
2.5221 855 0.035 - -
2.5369 860 0.0295 - -
2.5516 865 0.0349 - -
2.5664 870 0.0195 - -
2.5811 875 0.0265 - -
2.5959 880 0.0298 - -
2.6106 885 0.0321 - -
2.6254 890 0.0321 - -
2.6401 895 0.0299 - -
2.6549 900 0.0216 - -
2.6696 905 0.02 - -
2.6844 910 0.0277 - -
2.6991 915 0.0381 - -
2.7139 920 0.0296 - -
2.7286 925 0.0339 - -
2.7434 930 0.035 - -
2.7581 935 0.0293 - -
2.7729 940 0.038 - -
2.7876 945 0.0291 - -
2.8024 950 0.0411 - -
2.8171 955 0.0377 - -
2.8319 960 0.0282 - -
2.8466 965 0.0388 - -
2.8614 970 0.0286 - -
2.8761 975 0.0177 - -
2.8909 980 0.0352 - -
2.9056 985 0.0329 - -
2.9204 990 0.0265 - -
2.9351 995 0.0363 - -
2.9499 1000 0.021 0.0404 0.8374
2.9646 1005 0.0342 - -
2.9794 1010 0.0415 - -
2.9941 1015 0.0232 - -
3.0088 1020 0.0251 - -
3.0236 1025 0.0317 - -
3.0383 1030 0.0344 - -
3.0531 1035 0.021 - -
3.0678 1040 0.0271 - -
3.0826 1045 0.021 - -
3.0973 1050 0.0151 - -
3.1121 1055 0.0222 - -
3.1268 1060 0.0186 - -
3.1416 1065 0.0357 - -
3.1563 1070 0.0179 - -
3.1711 1075 0.0291 - -
3.1858 1080 0.0313 - -
3.2006 1085 0.0349 - -
3.2153 1090 0.0181 - -
3.2301 1095 0.0294 - -
3.2448 1100 0.0216 - -
3.2596 1105 0.0334 - -
3.2743 1110 0.0256 - -
3.2891 1115 0.026 - -
3.3038 1120 0.0176 - -
3.3186 1125 0.0231 - -
3.3333 1130 0.0164 - -
3.3481 1135 0.0226 - -
3.3628 1140 0.0286 - -
3.3776 1145 0.02 - -
3.3923 1150 0.0229 - -
3.4071 1155 0.0231 - -
3.4218 1160 0.0289 - -
3.4366 1165 0.0188 - -
3.4513 1170 0.0313 - -
3.4661 1175 0.0179 - -
3.4808 1180 0.0157 - -
3.4956 1185 0.0252 - -
3.5103 1190 0.019 - -
3.5251 1195 0.0251 - -
3.5398 1200 0.021 0.0399 0.8404
3.5546 1205 0.0154 - -
3.5693 1210 0.0187 - -
3.5841 1215 0.0221 - -
3.5988 1220 0.0148 - -
3.6136 1225 0.0168 - -
3.6283 1230 0.0236 - -
3.6431 1235 0.0194 - -
3.6578 1240 0.0245 - -
3.6726 1245 0.0171 - -
3.6873 1250 0.0235 - -
3.7021 1255 0.0243 - -
3.7168 1260 0.0325 - -
3.7316 1265 0.0196 - -
3.7463 1270 0.0362 - -
3.7611 1275 0.0188 - -
3.7758 1280 0.0151 - -
3.7906 1285 0.0189 - -
3.8053 1290 0.0286 - -
3.8201 1295 0.0266 - -
3.8348 1300 0.0216 - -
3.8496 1305 0.0218 - -
3.8643 1310 0.0214 - -
3.8791 1315 0.0224 - -
3.8938 1320 0.0213 - -
3.9086 1325 0.0302 - -
3.9233 1330 0.0196 - -
3.9381 1335 0.0218 - -
3.9528 1340 0.0226 - -
3.9676 1345 0.0204 - -
3.9823 1350 0.0215 - -
3.9971 1355 0.0258 - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.14.4
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
65
Safetensors
Model size
334M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SenhorDasMoscas/acho2-ptbr-e4-lr3e-05

Finetuned
(34)
this model

Evaluation results