SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ANGKJ1995/all-MiniLM-L6-v2-job-description")
# Run inference
sentences = [
    'Composer/Orchestrator writes musical compositions such as symphonies, sonatas or operas. He/she translates compositions into standard musical signs and symbols on scored music paper. He/she may write words to accompany music. He/she adapts melodies to suit the type and style of orchestras or bands and to produce various kinds of effects. He/she determines instruments to be employed, writes musical scores to produce the desired musical effect, rewrites music written for one instrument or purpose into suitable forms for other instruments or purposes.',
    'Conduct, direct, plan, and lead instrumental or vocal performances by musical artists or groups, such as orchestras, bands, choirs, and glee clubs; or create original works of music.',
    'Evaluate materials and develop machinery and processes to manufacture materials for use in products that must meet specialized design and performance specifications. Develop new uses for known materials. Includes those engineers working with composite materials or specializing in one type of material, such as graphite, metal and metal alloys, ceramics and glass, plastics and polymers, and naturally occurring materials. Includes metallurgists and metallurgical engineers, ceramic engineers, and welding engineers.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.7289

Training Details

Training Dataset

Unnamed Dataset

  • Size: 897 training samples
  • Columns: SSOC_DESCRIPTION, ONET_DESCRIPTION, and shuffled_ONET_DESCRIPTION
  • Approximate statistics based on the first 897 samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    type string string string
    details
    • min: 14 tokens
    • mean: 66.05 tokens
    • max: 166 tokens
    • min: 9 tokens
    • mean: 44.67 tokens
    • max: 161 tokens
    • min: 7 tokens
    • mean: 44.52 tokens
    • max: 161 tokens
  • Samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    Consumer audio/video equipment/radar broadcasting/transmitting equipment fitter/mechanic fits, adjusts, installs and repairs radio, television, transmitters, receivers and radar equipment in factory, workshop or place of use. He/she specialises in television transmitters/receivers, radar equipment, radio transmitters/receivers and two way radio communications equipment. He/she examines drawings and wiring diagrams, and diagnoses faults with aid of testing equipment. Repair, test, adjust, or install electronic equipment, such as industrial controls, transmitters, and antennas. Conduct programs of compensation and benefits and job analysis for employer. May specialize in specific areas, such as position classification and pension programs.
    Window cleaner washes and polishes windows and other glass fittings. He/she uses cleaning tools such as sponges and detergents to clean and polish windows, mirrors and other glass surfaces of buildings, both on the interior and exterior. He/she uses specific ladders to clean taller buildings with safety belts for support. Keep buildings in clean and orderly condition. Perform heavy cleaning duties, such as cleaning floors, shampooing rugs, washing walls and glass, and removing rubbish. Duties may include tending furnace and boiler, performing routine maintenance activities, notifying management of need for repairs, and cleaning snow or debris from sidewalk. Service automobiles, buses, trucks, boats, and other automotive or marine vehicles with fuel, lubricants, and accessories. Collect payment for services and supplies. May lubricate vehicle, change motor oil, refill antifreeze, or replace lights or other accessories, such as windshield wiper blades or fan belts. May repair or replace tires.
    Instrumentalist plays one or more musical instruments as a soloist, accompanist or member of an orchestra, band or other musical group. He/she studies and rehearses scores, tunes instruments to the proper pitch, plays music by manipulating keys, bows, valves, strings or percussion devices, depending on the type of instrument being played. He/she may improvise or transpose music or compose or arrange music. In an orchestra, he/she is usually designated according to the instrument played such as violinist, drummer or pianist. Play one or more musical instruments or sing. May perform on stage, for broadcasting, or for sound or video recording. Drive a light vehicle, such as a truck or van, with a capacity of less than 26,001 pounds Gross Vehicle Weight (GVW), primarily to pick up merchandise or packages from a distribution center and deliver. May load and unload vehicle.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 225 evaluation samples
  • Columns: SSOC_DESCRIPTION, ONET_DESCRIPTION, and shuffled_ONET_DESCRIPTION
  • Approximate statistics based on the first 225 samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    type string string string
    details
    • min: 16 tokens
    • mean: 64.88 tokens
    • max: 130 tokens
    • min: 7 tokens
    • mean: 43.49 tokens
    • max: 161 tokens
    • min: 9 tokens
    • mean: 44.06 tokens
    • max: 161 tokens
  • Samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    Salesperson (door-to-door) describes, demonstrates and sells goods and services and solicits business for establishments by approaching or visiting potential customers, usually residents in private homes, by going from door to door. He/she gives details of what establishment can supply and quotes prices and terms. Contact new or existing customers to determine their solar equipment needs, suggest systems or equipment, or estimate costs. Recruit, screen, interview, or place individuals within an organization. May perform other activities in multiple human resources areas.
    Secretary performs a variety of administrative tasks to help keep an organisation running smoothly. He/she answers telephone calls, drafts and sends e-mails, maintains diaries, arranges appointments, takes messages, files documents, organises and services meetings, and manages databases. Perform secretarial duties using specific knowledge of medical terminology and hospital, clinic, or laboratory procedures. Duties may include scheduling appointments, billing patients, and compiling and recording medical charts, reports, and correspondence. Set up, operate, or tend forging machines to taper, shape, or form metal or plastic parts.
    Purchasing agent buys machinery, equipment, raw materials, services and other supplies for use by the enterprise. He/she ascertains the requirements of the enterprise and studies market information on varieties and qualities available. He/she interviews vendors to ascertain their ability to meet the organisation’s specific requirements for design, performance, price and delivery. He/she may approve bills for payment. Purchase machinery, equipment, tools, parts, supplies, or services necessary for the operation of an establishment. Purchase raw or semifinished materials for manufacturing. May negotiate contracts. Evaluate and treat musculoskeletal injuries or illnesses. Provide preventive, therapeutic, emergency, and rehabilitative care.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 1e-05
  • num_train_epochs: 16
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 16
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Validation Loss job-description-eval_cosine_accuracy
-1 -1 - 0.1867
1.0 57 4.5738 0.4844
2.0 114 4.3775 0.7022
3.0 171 4.2681 0.7289

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
0
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for ANGKJ1995/all-MiniLM-L6-v2-job-description

Finetuned
(260)
this model

Evaluation results