SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ANGKJ1995/all-MiniLM-L6-v2-job-description")
# Run inference
sentences = [
'Composer/Orchestrator writes musical compositions such as symphonies, sonatas or operas. He/she translates compositions into standard musical signs and symbols on scored music paper. He/she may write words to accompany music. He/she adapts melodies to suit the type and style of orchestras or bands and to produce various kinds of effects. He/she determines instruments to be employed, writes musical scores to produce the desired musical effect, rewrites music written for one instrument or purpose into suitable forms for other instruments or purposes.',
'Conduct, direct, plan, and lead instrumental or vocal performances by musical artists or groups, such as orchestras, bands, choirs, and glee clubs; or create original works of music.',
'Evaluate materials and develop machinery and processes to manufacture materials for use in products that must meet specialized design and performance specifications. Develop new uses for known materials. Includes those engineers working with composite materials or specializing in one type of material, such as graphite, metal and metal alloys, ceramics and glass, plastics and polymers, and naturally occurring materials. Includes metallurgists and metallurgical engineers, ceramic engineers, and welding engineers.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
job-description-eval
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.7289 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 897 training samples
- Columns:
SSOC_DESCRIPTION
,ONET_DESCRIPTION
, andshuffled_ONET_DESCRIPTION
- Approximate statistics based on the first 897 samples:
SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION type string string string details - min: 14 tokens
- mean: 66.05 tokens
- max: 166 tokens
- min: 9 tokens
- mean: 44.67 tokens
- max: 161 tokens
- min: 7 tokens
- mean: 44.52 tokens
- max: 161 tokens
- Samples:
SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION Consumer audio/video equipment/radar broadcasting/transmitting equipment fitter/mechanic fits, adjusts, installs and repairs radio, television, transmitters, receivers and radar equipment in factory, workshop or place of use. He/she specialises in television transmitters/receivers, radar equipment, radio transmitters/receivers and two way radio communications equipment. He/she examines drawings and wiring diagrams, and diagnoses faults with aid of testing equipment.
Repair, test, adjust, or install electronic equipment, such as industrial controls, transmitters, and antennas.
Conduct programs of compensation and benefits and job analysis for employer. May specialize in specific areas, such as position classification and pension programs.
Window cleaner washes and polishes windows and other glass fittings. He/she uses cleaning tools such as sponges and detergents to clean and polish windows, mirrors and other glass surfaces of buildings, both on the interior and exterior. He/she uses specific ladders to clean taller buildings with safety belts for support.
Keep buildings in clean and orderly condition. Perform heavy cleaning duties, such as cleaning floors, shampooing rugs, washing walls and glass, and removing rubbish. Duties may include tending furnace and boiler, performing routine maintenance activities, notifying management of need for repairs, and cleaning snow or debris from sidewalk.
Service automobiles, buses, trucks, boats, and other automotive or marine vehicles with fuel, lubricants, and accessories. Collect payment for services and supplies. May lubricate vehicle, change motor oil, refill antifreeze, or replace lights or other accessories, such as windshield wiper blades or fan belts. May repair or replace tires.
Instrumentalist plays one or more musical instruments as a soloist, accompanist or member of an orchestra, band or other musical group. He/she studies and rehearses scores, tunes instruments to the proper pitch, plays music by manipulating keys, bows, valves, strings or percussion devices, depending on the type of instrument being played. He/she may improvise or transpose music or compose or arrange music. In an orchestra, he/she is usually designated according to the instrument played such as violinist, drummer or pianist.
Play one or more musical instruments or sing. May perform on stage, for broadcasting, or for sound or video recording.
Drive a light vehicle, such as a truck or van, with a capacity of less than 26,001 pounds Gross Vehicle Weight (GVW), primarily to pick up merchandise or packages from a distribution center and deliver. May load and unload vehicle.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Evaluation Dataset
Unnamed Dataset
- Size: 225 evaluation samples
- Columns:
SSOC_DESCRIPTION
,ONET_DESCRIPTION
, andshuffled_ONET_DESCRIPTION
- Approximate statistics based on the first 225 samples:
SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION type string string string details - min: 16 tokens
- mean: 64.88 tokens
- max: 130 tokens
- min: 7 tokens
- mean: 43.49 tokens
- max: 161 tokens
- min: 9 tokens
- mean: 44.06 tokens
- max: 161 tokens
- Samples:
SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION Salesperson (door-to-door) describes, demonstrates and sells goods and services and solicits business for establishments by approaching or visiting potential customers, usually residents in private homes, by going from door to door. He/she gives details of what establishment can supply and quotes prices and terms.
Contact new or existing customers to determine their solar equipment needs, suggest systems or equipment, or estimate costs.
Recruit, screen, interview, or place individuals within an organization. May perform other activities in multiple human resources areas.
Secretary performs a variety of administrative tasks to help keep an organisation running smoothly. He/she answers telephone calls, drafts and sends e-mails, maintains diaries, arranges appointments, takes messages, files documents, organises and services meetings, and manages databases.
Perform secretarial duties using specific knowledge of medical terminology and hospital, clinic, or laboratory procedures. Duties may include scheduling appointments, billing patients, and compiling and recording medical charts, reports, and correspondence.
Set up, operate, or tend forging machines to taper, shape, or form metal or plastic parts.
Purchasing agent buys machinery, equipment, raw materials, services and other supplies for use by the enterprise. He/she ascertains the requirements of the enterprise and studies market information on varieties and qualities available. He/she interviews vendors to ascertain their ability to meet the organisation’s specific requirements for design, performance, price and delivery. He/she may approve bills for payment.
Purchase machinery, equipment, tools, parts, supplies, or services necessary for the operation of an establishment. Purchase raw or semifinished materials for manufacturing. May negotiate contracts.
Evaluate and treat musculoskeletal injuries or illnesses. Provide preventive, therapeutic, emergency, and rehabilitative care.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 1e-05num_train_epochs
: 16warmup_ratio
: 0.1fp16
: Trueload_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 1e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 16max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Validation Loss | job-description-eval_cosine_accuracy |
---|---|---|---|
-1 | -1 | - | 0.1867 |
1.0 | 57 | 4.5738 | 0.4844 |
2.0 | 114 | 4.3775 | 0.7022 |
3.0 | 171 | 4.2681 | 0.7289 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for ANGKJ1995/all-MiniLM-L6-v2-job-description
Base model
sentence-transformers/all-MiniLM-L6-v2Evaluation results
- Cosine Accuracy on job description evalself-reported0.729