metadata
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:183
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: >-
14 William O. Douglas, quoted in Charles Hurd, Film Booking Issue Ordered
Reopened,” New York Times, May 4, 1948, 1. 15 Movie Crisis Laid to Video
Inroads And Dwindling of Foreign Market, New York Times, February 27,
1949, F1. For details on the lawsuit and its effects, see Arthur De Vany
and Henry McMillan, Was the Antitrust Action that Broke Up the Movie
Studios Good for the Movies? Evidence from the Stock Market. American Law
and Economics Review 6, no. 1 (2004): 135-53; and J.C. Strick, The
Economics of the Motion Picture Industry: A Survey, Philosophy of the
Social Sciences 8, no. 4 (December 1978): 406-17. 16 The Hollywood feature
films for which Eisler provided music are Hangmen Also Die (1942), None
But the Lonely Heart (1944), Jealousy (1945), The Spanish Main (1945); A
Scandal in Paris (1946), Deadline at Dawn (1946), Woman on the Beach
(1947), and So Well Remembered (1947). Most of these are
middle-of-the-road genre pieces, but the first NOTES 267
sentences:
- >-
What is the opinion of Ernest Irving, a pioneer of British film music,
on the overall quality of American film music?
- >-
What is the title of the 2007 film directed by David Fincher, produced
by Michael Medavoy, and featuring a storyline based on a real-life
serial killer, as mentioned in the provided context information?
- >-
What was the primary reason behind the lawsuit that led to the breakup
of the movie studios, as suggested by the article in the New York Times
on February 27, 1949?
- source_sentence: >-
But Gorbman (who like Flinn and Kalinak approached film music from a
formal background not in musicology but in literary criticism) was
certainly not the first scholar engaged in so-called film studies44 to
address the role that extra-diegetic music played in classical-style
films. Two years before Gorbman's book was published, the trio of
Bordwell, Staiger, and Thompson brought out their monumental The Classical
Hollywood Cinema: Film Style and Production to 1960. As noted above, and
apropos of its title, the book focuses on filmic narrative style and the
technical devices that made this style possible. In its early pages,
however, it also contains insightful comments on classical cinema's use of
music. The book's first music-related passage lays a foundation for
Gorbman's point about how a score might lend unity to a film by recycling
distinctive themes that within the THE GOLDEN AGE OF FILM MUSIC, 1933-49
143
sentences:
- >-
What is the possible reason, as suggested by David Thomson, for why
David Lean's filmmaking style may have declined after the movie "Summer
Madness" (US, 1955)?
- >-
What shift in the portrayal of hard body male characters in film, as
exemplified by the actors who played these roles in the 1980s and 1990s,
suggests that societal expectations and norms may be changing?
- >-
What is the significance of the authors' formal background in literary
criticism rather than musicology, as mentioned in the context of
Gorbman's approach to film music?
- source_sentence: >-
(1931); Georg Wilhelm Pabst's Kameradschaft (1931); Fritz Lang's M (1931)
and Das Testament der Dr. Mabuse (1932); and Carl Theodor Dreyer's Vampyr
(1932). These films’ subtle mix of actual silence with accompanying music
and more or less realistic sound effects has drawn and doubtless will
continue to draw serious analytical attention from film scholars.45 And
even in their own time they drew due attention aplenty from critics of
avant-garde persuasion.46 The mere fact that these films differed from the
sonic norm attracted the notice, if not always the praise, of movie
reviewers for the popular press. Writing from London, a special
correspondent for the New York Times observed that Hitchcock's Blackmail
goes some way to showing how the cinematograph and the microphone can be
mated without their union being forced upon the attention of a punctilious
world as VITAPHONE AND MOVIETONE, 1926-8 101
sentences:
- >-
What was the primary limitation that led to the failure of Edison's
first Kinetophone, which was an early attempt at sound film featuring
musical accompaniment?
- >-
What was the specific sonic approach employed by the mentioned films of
Georg Wilhelm Pabst, Fritz Lang, and Carl Theodor Dreyer that drew
serious analytical attention from film scholars?
- >-
What limitation in Martin Scorsese's background, as mentioned in the
text, restricted his choice of subjects at this stage in his career?
- source_sentence: "39\tdivided into small, three-dimensional cubes known as volumetric pixels, or voxels. When viewers are watching certain images, the voxel demonstrates how these images in the movie are mapped into brain activity. Clips of the movie are reconstructed through brain imaging and computer stimulation by associating visual patterns in the movie with the corresponding brain activity. However, these reconstructions are blurry and are hard to make because researchers say, blood flow signals measured using fMRI change much more slowly than the neural signals that encode dynamic information in movies. Psychology and neuroscience professor, Jack Gallant explains in an interview that primary visual cortex responds to the local features of the movie such as edges, colors, motion, and texture but this part of the brain cannot understand the objects in the movie. In addition, movies that show people are reconstructed with better accuracy than abstract images. Using Neuroimaging For Entertainment Success Can brain scans predict movie success in the box office? Two marketing researchers from the Rotterdam School of Management devised an experiment by using EEG on participants. EEG demonstrated that individual choice and box office success correlate with different types of brain activity. From article, How Neuroimaging Can Save The Entertainment Industry Millions of Dollars, it states, individual choice is predicted best by high frontocentral beta activity, the choice of the general population is predicted by frontal gamma activity. Perhaps, with quickly advanced technology, predicting movie genre and plots that can hit the box office could be successful. Neurocinema in Hollywood One strategy that helps filmmakers, producers, and distributors to achieve global market success is by using fMRI and EEG to make a better storyline, characters, sound effects, and other"
sentences:
- >-
What significant change in the portrayal of Rocky's character is evident
in the 2015 movie Creed, as compared to the original 1976 film Rocky?
- >-
What factors led to the selection of the films "Spider-man" (2002),
"Cars" (2006), and "Avatar" (2009) for the research project examining
the relationship between film and society in the early 2000s?
- >-
What is the main reason why researchers find it challenging to
reconstruct abstract images from movie clips using brain imaging and
computer stimulation?
- source_sentence: "11\tdocumentary film so unpleasant when most had sat through horror pictures that were appreciably more violent and bloody. The answer that McCauley came up with was that the fictional nature of horror films affords viewers a sense of control by placing psychological distance between them and the violent acts they have witnessed. Most people who view horror movies understand that the filmed events are unreal, which furnishes them with psychological distance from the horror portrayed in the film. In fact, there is evidence that young viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal (Hoekstra, Harris, & Helmick, 1999). Four Viewing Motivations for Graphic Horror According to Dr. Deirdre Johnston (1995) study Adolescents’ Motivations for Viewing Graphic Horror of Human Communication Research there are four different main reasons for viewing graphic horror. From the study of a small sample of 220 American adolescents who like watching horror movies, Dr. Johnston reported that: The four viewing motivations are found to be related to viewers’ cognitive and affective responses to horror films, as well as viewers’ tendency to identify with either the killers or victims in these films.\" Dr. Johnson notes that: 1) gore watchers typically had low empathy, high sensation seeking, and (among males only) a strong identification with the killer, 2) thrill watchers typically had both high empathy and sensation seeking, identified themselves more with the victims, and liked the suspense of the film, 3) independent watchers typically had a high empathy for the victim along with a high positive effect for overcoming fear, and 4) problem watchers typically had high empathy for the victim but were"
sentences:
- >-
What was the name of the series published by Oliver Ditson from 1918-25
that contained ensemble music for motion picture plays?
- >-
What shift in the cultural, political, and social contexts of the 1980s
and 1990s may have led to the deconstruction of the hard body characters
portrayed by actors such as Stallone and Schwarzenegger in more recent
movies?
- >-
What is the primary reason why viewers who perceive greater realism in
horror films are more negatively affected by their exposure to horror
films than viewers who perceive the film as unreal?
datasets:
- YxBxRyXJx/QAsimple_for_BGE_241019
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: BGE base Movie Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.8205128205128205
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9743589743589743
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.8205128205128205
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.32478632478632485
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.20000000000000004
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.10000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.8205128205128205
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9743589743589743
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9207838928594967
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8940170940170941
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8940170940170938
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.8461538461538461
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9230769230769231
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.8461538461538461
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.30769230769230776
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.20000000000000004
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.10000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.8461538461538461
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9230769230769231
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9233350110390831
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8982905982905982
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8982905982905982
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.8461538461538461
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9230769230769231
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9487179487179487
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.8461538461538461
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.30769230769230776
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18974358974358976
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.10000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.8461538461538461
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9230769230769231
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9487179487179487
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9234104189545929
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.898962148962149
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.898962148962149
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.7692307692307693
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8974358974358975
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9487179487179487
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9487179487179487
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7692307692307693
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.29914529914529925
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18974358974358976
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09487179487179488
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7692307692307693
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8974358974358975
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9487179487179487
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9487179487179487
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8688480033444261
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8418803418803418
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8443986568986569
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.5641025641025641
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8717948717948718
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9230769230769231
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9487179487179487
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5641025641025641
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2905982905982907
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18461538461538465
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09487179487179488
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.5641025641025641
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8717948717948718
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9230769230769231
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9487179487179487
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.768187565996018
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.708119658119658
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7088711597999523
name: Cosine Map@100
BGE base Movie Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the q_asimple_for_bge_241019 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("YxBxRyXJx/bge-base-movie-matryoshka")
# Run inference
sentences = [
'11\tdocumentary film so unpleasant when most had sat through horror pictures that were appreciably more violent and bloody. The answer that McCauley came up with was that the fictional nature of horror films affords viewers a sense of control by placing psychological distance between them and the violent acts they have witnessed. Most people who view horror movies understand that the filmed events are unreal, which furnishes them with psychological distance from the horror portrayed in the film. In fact, there is evidence that young viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal (Hoekstra, Harris, & Helmick, 1999). Four Viewing Motivations for Graphic Horror According to Dr. Deirdre Johnston (1995) study Adolescents’ Motivations for Viewing Graphic Horror of Human Communication Research there are four different main reasons for viewing graphic horror. From the study of a small sample of 220 American adolescents who like watching horror movies, Dr. Johnston reported that: The four viewing motivations are found to be related to viewers’ cognitive and affective responses to horror films, as well as viewers’ tendency to identify with either the killers or victims in these films." Dr. Johnson notes that: 1) gore watchers typically had low empathy, high sensation seeking, and (among males only) a strong identification with the killer, 2) thrill watchers typically had both high empathy and sensation seeking, identified themselves more with the victims, and liked the suspense of the film, 3) independent watchers typically had a high empathy for the victim along with a high positive effect for overcoming fear, and 4) problem watchers typically had high empathy for the victim but were',
'What is the primary reason why viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal?',
'What shift in the cultural, political, and social contexts of the 1980s and 1990s may have led to the deconstruction of the hard body characters portrayed by actors such as Stallone and Schwarzenegger in more recent movies?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.8205 | 0.8462 | 0.8462 | 0.7692 | 0.5641 |
cosine_accuracy@3 | 0.9744 | 0.9231 | 0.9231 | 0.8974 | 0.8718 |
cosine_accuracy@5 | 1.0 | 1.0 | 0.9487 | 0.9487 | 0.9231 |
cosine_accuracy@10 | 1.0 | 1.0 | 1.0 | 0.9487 | 0.9487 |
cosine_precision@1 | 0.8205 | 0.8462 | 0.8462 | 0.7692 | 0.5641 |
cosine_precision@3 | 0.3248 | 0.3077 | 0.3077 | 0.2991 | 0.2906 |
cosine_precision@5 | 0.2 | 0.2 | 0.1897 | 0.1897 | 0.1846 |
cosine_precision@10 | 0.1 | 0.1 | 0.1 | 0.0949 | 0.0949 |
cosine_recall@1 | 0.8205 | 0.8462 | 0.8462 | 0.7692 | 0.5641 |
cosine_recall@3 | 0.9744 | 0.9231 | 0.9231 | 0.8974 | 0.8718 |
cosine_recall@5 | 1.0 | 1.0 | 0.9487 | 0.9487 | 0.9231 |
cosine_recall@10 | 1.0 | 1.0 | 1.0 | 0.9487 | 0.9487 |
cosine_ndcg@10 | 0.9208 | 0.9233 | 0.9234 | 0.8688 | 0.7682 |
cosine_mrr@10 | 0.894 | 0.8983 | 0.899 | 0.8419 | 0.7081 |
cosine_map@100 | 0.894 | 0.8983 | 0.899 | 0.8444 | 0.7089 |
Training Details
Training Dataset
q_asimple_for_bge_241019
- Dataset: q_asimple_for_bge_241019 at 66635cd
- Size: 183 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 183 samples:
positive anchor type string string details - min: 191 tokens
- mean: 356.1 tokens
- max: 512 tokens
- min: 16 tokens
- mean: 36.04 tokens
- max: 66 tokens
- Samples:
positive anchor 1 Introduction Why do we watch horror films? What makes horror films so exciting to watch? Why do our bodies sweat and muscles tense when we are scared? How do filmmakers, producers, sound engineers, and cinematographers specifically design a horror film? Can horror movies cause negative, lasting effects on the audience? These are some of the questions that are answered by exploring the aesthetics of horror films and the psychology behind horror movies. Chapter 1, The Allure of Horror Film, illustrates why we are drawn to scary films by studying different psychological theories and factors. Ideas include: catharsis, subconscious mind, curiosity, thrill, escape from reality, relevance, unrealism, and imagination. Also, this chapter demonstrates why people would rather watch fiction films than documentaries and the motivations for viewing graphic horror. Chapter 2, Mise-en-scène in Horror Movies, includes purposeful arrangement of scenery and stage properties of horror movie. Also...
What is the name of the emerging field of scientists and filmmakers that uses fMRI and EEG to read people's brain activity while watching movie scenes?
3 Chapter 1: The Allure of Horror Film Overview Although watching horror films can make us feel anxious and uneasy, we still continue to watch other horror films one after another. It is ironic how we hate the feeling of being scared, but we still enjoy the thrill. So why do we pay money to watch something to be scared? Eight Theories on why we watch Horror Films From research by philosophers, psychoanalysts, and psychologists there are theories that can explain why we are drawn to watching horror films. The first theory, psychoanalyst, Sigmund Freud portrays that horror comes from the “uncanny” emergence of images and thoughts of the primitive id. The purpose of horror films is to highlight unconscious fears, desire, urges, and primeval archetypes that are buried deep in our collective subconscious images of mothers and shadows play important roles because they are common to us all. For example, in Alfred Hitchcock's Psycho, a mother plays the role of evil in the main character...
What process, introduced by the Greek Philosopher Aristotle, involves the release of negative emotions through the observation of violent or scary events, resulting in a purging of aggressive emotions?
5 principle unknowable (Jancovich, 2002, p. 35). This meaning, the audience already knows that the plot and the characters are already disgusting, but the surprises in the horror narrative through the discovery of curiosity should give satisfaction. Marvin Zuckerman (1979) proposed that people who scored high in sensation seeking scale often reported a greater interest in exciting things like rollercoasters, bungee jumping and horror films. He argued more individuals who are attracted to horror movies desire the sensation of experience. However, researchers did not find the correlation to thrill-seeking activities and enjoyment of watching horror films always significant. The Gender Socialization theory (1986) by Zillman, Weaver, Mundorf and Aust exposed 36 male and 36 female undergraduates to a horror movie with the same age, opposite-gender companion of low or high initial appeal who expressed mastery, affective indifference, or distress. They reported that young men enjoyed the fi...
What is the proposed theory by Marvin Zuckerman (1979) regarding the relationship between sensation seeking and interest in exciting activities, including horror films?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 32per_device_eval_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 5lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Truelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|
1.0 | 1 | 0.8987 | 0.8983 | 0.8835 | 0.8419 | 0.7773 |
2.0 | 2 | 0.9218 | 0.9141 | 0.9075 | 0.8721 | 0.8124 |
1.0 | 1 | 0.9218 | 0.9141 | 0.9075 | 0.8721 | 0.8124 |
2.0 | 2 | 0.9356 | 0.9302 | 0.9118 | 0.8750 | 0.8057 |
3.0 | 4 | 0.9302 | 0.9233 | 0.9234 | 0.8783 | 0.7759 |
4.0 | 5 | 0.9208 | 0.9233 | 0.9234 | 0.8688 | 0.7682 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.46.3
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}