SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-large
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("m7n/discipline-bert-modern-large_01")
# Run inference
sentences = [
    'normal subjects, residents of the Ural region, were examined by a dichromatic bone densitometer of "GE/Lunar" firm (USA). After that they were divided according to their somatotype: normosthenics, hypersthenics and asthenics. Age-related groups in girls were formed from the age of years, in youths from years, up to years every other year, after years every years up to the age of years. The somatotype has been revealed to influence the mineral density (MD) of skeleton, the mass of muscular, connective and fatty tissues: MD in girls has been formed at the age of years, in youths at that of years. In normosthenics and asthenics MD at the same age was % and %, respectively. At the age of years MD in women with hypersthenia was % less than peak bone mass, in those with normosthenia it was % less and in women with asthenia % less. In men these measurements were and %, respectively.',
    'x-ray images of patients with posttraumatic defects of forearm bones have been analyzed using DiaMorph computer-assisted complex. Mean optical density of regenerated bone shadows has been evaluated for the purpose of studying the dynamics of osteogenesis and mineralization of newly formed bone tissue during osteosynthesis. By planimetry of distraction regenerated bones it was established that osteogenesis developed by normoplastic type. Typical distraction regenerated bones were formed while filling defect-diastases; the regenerated bones lost their zonal structure at the end of fixation period. During formation of wedge-shaped regenerated bones clear zonal structure of newly formed tissue was not traced, the area of interlayer occupied significantly less part than it was in case of filling the defects of forearm bones by fragment lengthening and formation of typical distraction regenerated bone.',
    'Early adult changes in the facial profile were studied longitudinally from to years of age in a Swedish Caucasian sample of female and male dental students. Lateral cephalometric radiographs were analysed by the conventional point-based method and by the structure-based method of superimposing serial films, adapted for computerized numerical analysis. Skeletal and soft tissue changes were described by linear and angular variables. The magnitude of linear dimensional changes was similar in the two sexes. The largest changes were found in the vertical dimensions. Total anterior facial height increased by about mm in the -year period, suggesting that the major part of the increase in vertical facial dimensions during the third decade of life takes place in the first half of this decade. Sagittal jaw relationship increased by about in both sexes. Soft tissue changes reflected those of the vertical skeletal dimensions.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: modernBERT and modernBERT_disciplines
  • Evaluated with TripletEvaluator
Metric modernBERT modernBERT_disciplines
cosine_accuracy 0.6726 0.6756

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,828 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 82 tokens
    • mean: 236.35 tokens
    • max: 620 tokens
    • min: 82 tokens
    • mean: 237.05 tokens
    • max: 663 tokens
    • min: 82 tokens
    • mean: 247.65 tokens
    • max: 653 tokens
  • Samples:
    anchor positive negative
    Implementing management systems in organisations of all types and sizes often raises the following question: "What benefits will this bring?" Initial resistance and criticism are common as potential challenges are identified during the implementation process. To address this, it is essential to highlight the advantages of these systems and engage stakeholders in supporting management efforts. While the planning, implementation, use, maintenance, auditing, and improvement of management systems are generally voluntary, certification is frequently driven by external factors, particularly customer demands. Employees also stand to gain significantly, with knowledge and information serving as valuable resources, especially for leveraging artificial intelligence. This article explores the management's readiness to adopt and fully utilise two management systems based on international standards: the ISO Knowledge management system (KMS) and the ISO/IEC Artificial intelligence management system ... Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk... This study aims to obtain user satisfaction factors for a knowledge management system so that a questionnaire can be made for evaluation or measurement. The SECI method is used with the CISE sequence which consists of four knowledge creation steps, namely C-combination, I-nternalization, S-socialization, and ending E-externalization. The stage begins with literature studies and then modifications are made with the selection, addition, and incorporation of existing models. From understanding and analyzing several models, discussions or brainstorming with colleagues were then carried out so that a final model was obtained to compile a list of keywords and statements as a questionnaire based on indicators related to knowledge management and the satisfaction of knowledge management system users. The results obtained there are eight user satisfaction factors divided into technical aspects (knowledge quality, knowledge sharing, system quality, service quality) and social aspects (management ...
    This study examines the effect of alloying elements of Ni and W on the repassivation properties of stainless steel (SS) as evaluated by a rapid scratching electrode technique and stress corrosion cracking (SCC) test. The SS specimens were grouped into two different grades according to Ni content (00Cr-0Ni duplex, Type 000LMN [UNS S00000] austenitic SS). Major considerations regarding alloy design were Ni content and the substitution of W for Mo. However, a similar pitting resistance equivalent number (PREN) of to was maintained for all specimens. The main factors for evaluation of repassivation properties are the peak current for the scratched surface and repassivation rate. In M magnesium chloride (MgCl0) and N sulfuric acid containing chloride ions (H0SO0 + % Cl) solution, repassivation test results showed that repassivation properties decreased as Ni content increased. However, W substitution was effective on the repassivation process and increased the resistance of SCC property for... Abstract High-nitrogen (N) stainless steels (SS) are receiving increased attention because of their strength advantages over carbon (C)-alloyed materials, but they have been found susceptible to dichromium nitride (Cr0N) precipitation during thermal exposure between 000C and ,000C. Sensitization susceptibility of a high-N, low-C austenitic SS by Cr0N precipitation at 000C and 000C was determined using the single-loop electrochemical potentiokinetic reactivation (EPR) test. High-N SS was found susceptible to sensitization caused by grain boundary (GB) precipitation of Cr0N, with the degree of sensitization increasing systematically with aging time at 000C. Sensitization of high-N materials did not require the concomitant precipitation of chromium (Cr)-rich metal carbide (M00C0). Materials aged at 000C were not sensitized, although the rate of precipitation was greater than at 000C. This indicated the minimum Cr level in the Cr-depleted zone of the matrix associated with nitride precipit... The anodic dissolution characteristics of nickel, molybdenum, and stainless steel have been examined in pure and eutectic melt. Molybdenum and nickel show Tafeltype dissolution kinetics in pure eutectic which permit estimates of longterm corrosion rates as a function of voltage. Nickel exhibits a sharp threshold potential for dissolution in melt, forming a nonpassivating layer. Comparative voltammetry and opencircuit potential measurements with iron in this melt suggest that care may be required in using nickel as an iron sulfide current collector. The anodic dissolution of stainless steel in melt appears to be rate limited by diffusion through a reaction layer, showing a dependence that may be applicable to longterm corrosion predictions. Dissolution is strongly inhibited by dissolved , apparently by formation of a protective anodic oxide layer. Molybdenum appears to owe its excellent anodic corrosion resistance in melt both to a chemically formed prepassive film and to a welldefined ...
    FY-0E WindRAD (Fengyun-0E Wind Radar) is a dual-frequency rotating fan-beam scatterometer. Its data characteristics, NOC (NWP Ocean Calibration), and wind retrieval performance are investigated in this paper. The diversity of the radar view geometry varies across the swaths, with maximum diversity in the sweet swaths and limited diversity in the outer and nadir swaths. When NOC backscatter calibration coefficients are computed as a function of incidence angle only (NOCint), a smooth correction is found. However, when relative antenna azimuth angle is included (NOCant), it appears that the corrections as a function of relative azimuth angle vary harmonically and substantially for a specific incidence angle. NOCant corrections yield a better fit of the measurements to the GMF (Geophysical Model Function). Hence, NOCant is applied for the analysis of wind retrieval from the Ku-band and C-band. An extra engineering correction of dB and dB is applied on Ku-band and C-band backscatter values... Spaceborne synthetic aperture radar (SAR) represents a powerful source of data for enhancing maritime domain awareness (MDA). Wakes generated by traveling vessels hold a crucial role in MDA since they can be exploited both for ship route and velocity estimation and as a marker of ship presence. Even if deep learning (DL) has led to an impressive performance boost on a variety of computer vision tasks, its usage for automatic target recognition (ATR) in SAR images to support MDA is still limited to the detection of ships rather than ship wakes. A dataset is presented in this paper and several state-of-the-art object detectors based on convolutional neural networks (CNNs) are tested with different backbones. The dataset, including more than wake chips, is realized by visually inspecting Sentinel- images over highly trafficked maritime sites. Extensive experiments are shown to characterize CNNs for the wake detection task. For the first time, a deep-learning approach is implemented to spe... With the publication of Part Wind Actions of the South African Loading Code SANS : , several issues concerning adjustments from the reference standard Eurocode EN - - : could not be resolved due to lack of sufficient updated background information on South African conditions. The need for updating the map for the free field wind speed is related also to the improved representation of the mixed and complex strong wind climate of the country. Furthermore, strong wind probability models are used for the reliability assessment and calibration of wind design procedures. Updating of the reliability provisions for the revised wind loading process was a further need identified at the time. This paper provides a review of the historical development of the representation of the free field wind, used as input to design wind loading procedures for South Africa. The review considers: (i) the historical representations of the geographic distribution of free field wind, (ii) the climatic influences c...
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 391 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 391 samples:
    anchor positive negative
    type string string string
    details
    • min: 85 tokens
    • mean: 239.86 tokens
    • max: 592 tokens
    • min: 87 tokens
    • mean: 229.31 tokens
    • max: 542 tokens
    • min: 93 tokens
    • mean: 239.99 tokens
    • max: 592 tokens
  • Samples:
    anchor positive negative
    Industrial Relations: A Journal of Economy and SocietyVolume , Issue p. - Internet Resources Selected by the Institute for Research on Labor and Employment Library University of California, Berkeley TERENCE K. HUWE, TERENCE K. HUWE Director of Library & Information ResourcesSearch for more papers by this authorJANICE KIMBALL, JANICE KIMBALL Library AssistantSearch for more papers by this author TERENCE K. HUWE, TERENCE K. HUWE Director of Library & Information ResourcesSearch for more papers by this authorJANICE KIMBALL, JANICE KIMBALL Library AssistantSearch for more papers by this author First published: April the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a ... Industrial Relations: A Journal of Economy and SocietyVolume , Issue p. - Recent Publications Selected by the Institute for Research on Labor and Employment Library University of California, Berkeley Terence K. Huwe, Terence K. Huwe Director of Library & Information ResourcesSearch for more papers by this authorJanice Kimball, Janice Kimball Library AssistantSearch for more papers by this author Terence K. Huwe, Terence K. Huwe Director of Library & Information ResourcesSearch for more papers by this authorJanice Kimball, Janice Kimball Library AssistantSearch for more papers by this author First published: September the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to sha... This paper suggests that all the models of industrial relations, not just the more statist ones, have been characterized throughout their history by complex and sometimes troublesome relationships with the state. These models have always been conditioned, and in certain sense shaped, by the latter's more or less direct intervention at the moment of their formation and as they have expanded or declined. An intervention which is also influenced by the nature of economic problems that national political economies have to cope with. Such difficulties of relationship are to a large extent due to the fact that political regulation and regulation through industrial relations only partially overlap in their goals and contents. More frequently they compete with each other and have methods and logics that tend to diverge. Whereas decisions are taken by majority principle in the political sphere, in industrial relations they can only be taken unamimously - and especially so in collective bargaini...
    Poor response rates to follow-up questionnaires can adversely affect the progress of a randomised controlled trial and the validity of its results. This embedded 'study within a trial' aimed to investigate the impact of including a pen with the postal -month questionnaire completed by the trial participants on the response rates to this questionnaire.This study was a two-armed randomised controlled trial nested in the Gentle Years Yoga (GYY) trial. Participants in the intervention group of the GYY trial were allocated : using simple randomisation to either receive a pen (intervention) or no pen with their -month questionnaire (control). The primary outcome was the proportion of participants sent a -month questionnaire who returned it. Secondary outcomes were time taken to return the questionnaire, proportion of participants sent a reminder to return the questionnaire, and completeness of the questionnaire. Binary outcomes were analysed using logistic regression, time to return by Cox P... Background Poor response rates to follow-up questionnaires can adversely affect the progress of a randomised controlled trial and the validity of its results. This embedded 'study within a trial' aimed to investigate the impact of including a pen with the postal -month questionnaire completed by the trial participants on the response rates to this questionnaire. Methods This study was a two-armed randomised controlled trial nested in the Gentle Years Yoga (GYY) trial. Participants in the intervention group of the GYY trial were allocated : using simple randomisation to either receive a pen (intervention) or no pen with their -month questionnaire (control). The primary outcome was the proportion of participants sent a -month questionnaire who returned it. Secondary outcomes were time taken to return the questionnaire, proportion of participants sent a reminder to return the questionnaire, and completeness of the questionnaire. Binary outcomes were analysed using logistic regression, tim... Patients' failure to adhere on tuberculosis (TB) treatment leads to drug resistance, relapse and death. Non-adherence to TB treatment is higher during continuation treatment phase. The study aimed to evaluate effectiveness of combined pill refilling and medication reminders on adherence to TB treatment.A two-arm randomised controlled trial on adult patients with TB was used during continuation treatment phase. In the first arm, in addition to usual care, participants will receive cellphone-based daily medication and weekly pill refilling reminders. In the control arm, participants will receive only usual care. The study will use a covariate adaptive randomisation technique to balance covariates during allocation. The primary outcome is patients' adherence to TB treatment and secondary outcomes are attendance to clinic and treatment outcomes. We apply intention to treat with generalised linear mixed model.Ethical approval was obtained from Institutional Review Board of University of Gon...
    EthologyVolume , Issue p. i-i Front CoverFree Access A male Swainson's Spurfowl, Pternistis swainsonii, calling out a raucous 'krrrraaak-krrrraaak-krrrraaak' in the bushveld of Kruger National Park, South Africa. Photograph reproduced by permission of Emmanuel Do Linh San - First published: June ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat No abstract is available for this article. Volume000, Issue0July 0000Pages i-i RelatedInformation EthologyVolume , Issue p. i-i Front CoverFree Access Breeding male Southern Masked-Weaver, Ploceus velatus, building a nest in Addo Elephant National Park, South Africa. Photograph reproduced by permission of Emmanuel Do Linh San First published: July ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat No abstract is available for this article. Volume000, Issue0August 0000Pages i-i RelatedInformation IbisVolume , Issue p. - Do male Chaffinches Fringilla coelebs copy song sequencing and bout length from their tutors? Katharina Riebel, Corresponding Author Katharina Riebel School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKBehavioural Biology, Institute of Evolutionary and Ecology Sciences, PO Box , RA Leiden, The Nederlands. Email: for more papers by this authorPeter J. B. Slater, Peter J. B. Slater School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKSearch for more papers by this author Katharina Riebel, Corresponding Author Katharina Riebel School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKBehavioural Biology, Institute of Evolutionary and Ecology Sciences, PO Box , RA Leiden, The Nederlands. Email: for more papers by this authorPeter J. B. Slater, Peter J. B. Slater School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS...
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss modernBERT_cosine_accuracy modernBERT_disciplines_cosine_accuracy
0 0 - - 0.4783 -
0.0511 100 0.0534 0.0495 0.5090 -
0.1022 200 0.0502 0.0474 0.5243 -
0.1533 300 0.0486 0.0465 0.5499 -
0.2044 400 0.0465 0.0457 0.5831 -
0.2555 500 0.0468 0.0467 0.5754 -
0.3066 600 0.0465 0.0444 0.6113 -
0.3577 700 0.0426 0.0467 0.5831 -
0.4088 800 0.0445 0.0454 0.5857 -
0.4599 900 0.0441 0.0441 0.6215 -
0.5110 1000 0.0432 0.0423 0.6189 -
0.5621 1100 0.0433 0.0417 0.6189 -
0.6132 1200 0.0395 0.0416 0.6240 -
0.6643 1300 0.0408 0.0403 0.6419 -
0.7154 1400 0.0414 0.0414 0.6445 -
0.7665 1500 0.044 0.0423 0.6343 -
0.8176 1600 0.0436 0.0418 0.6292 -
0.8687 1700 0.0392 0.0402 0.6624 -
0.9198 1800 0.039 0.0434 0.6419 -
0.9709 1900 0.0413 0.0439 0.5959 -
1.0220 2000 0.0396 0.0437 0.6087 -
1.0731 2100 0.0402 0.0414 0.6266 -
1.1242 2200 0.0402 0.0411 0.6496 -
1.1753 2300 0.0362 0.0415 0.6419 -
1.2264 2400 0.0371 0.0393 0.6496 -
1.2775 2500 0.0353 0.0396 0.6445 -
1.3286 2600 0.0322 0.0418 0.6496 -
1.3797 2700 0.0329 0.0412 0.6394 -
1.4308 2800 0.0311 0.0400 0.6445 -
1.4819 2900 0.0318 0.0385 0.6573 -
1.5330 3000 0.0306 0.0387 0.6726 -
1.5841 3100 0.0273 0.0387 0.6803 -
1.6352 3200 0.0285 0.0384 0.6803 -
1.6863 3300 0.0299 0.0375 0.6675 -
1.7374 3400 0.0304 0.0378 0.6522 -
1.7885 3500 0.03 0.0388 0.6496 -
1.8396 3600 0.028 0.0383 0.6803 -
1.8906 3700 0.0264 0.0380 0.6957 -
1.9417 3800 0.0275 0.0388 0.6573 -
1.9928 3900 0.0314 0.0378 0.6803 -
2.0439 4000 0.03 0.0388 0.6777 -
2.0950 4100 0.0308 0.0380 0.6752 -
2.1461 4200 0.0263 0.0382 0.6598 -
2.1972 4300 0.0215 0.0391 0.6573 -
2.2483 4400 0.017 0.0413 0.6471 -
2.2994 4500 0.0173 0.0398 0.6726 -
2.3505 4600 0.0183 0.0393 0.6752 -
2.4016 4700 0.0189 0.0399 0.6957 -
2.4527 4800 0.0123 0.0407 0.6803 -
2.5038 4900 0.0155 0.0405 0.6803 -
2.5549 5000 0.0108 0.0413 0.6726 -
2.6060 5100 0.0112 0.0416 0.6650 -
2.6571 5200 0.0134 0.0414 0.6777 -
2.7082 5300 0.0133 0.0406 0.6624 -
2.7593 5400 0.0109 0.0408 0.6701 -
2.8104 5500 0.0121 0.0408 0.6726 -
2.8615 5600 0.0124 0.0408 0.6752 -
2.9126 5700 0.012 0.0407 0.6752 -
2.9637 5800 0.0127 0.0406 0.6726 -
3.0 5871 - - - 0.6756

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
6
Safetensors
Model size
395M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for m7n/discipline-bert-modern-large_01

Finetuned
(46)
this model

Evaluation results