embedding model smaller files

Browse files

Files changed (15) hide show

1_Pooling/.ipynb_checkpoints/config-checkpoint.json +10 -0
1_Pooling/config.json +10 -0
README.md +527 -0
config.json +44 -0
config_sentence_transformers.json +10 -0
modules.json +14 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +62 -0
trainer_state.json +333 -0
training_args.bin +3 -0
vocab.txt +0 -0

1_Pooling/.ipynb_checkpoints/config-checkpoint.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,527 @@

+---
+base_model: Alibaba-NLP/gte-large-en-v1.5
+datasets: []
+language: []
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy
+- dot_accuracy
+- manhattan_accuracy
+- euclidean_accuracy
+- max_accuracy
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:7075
+- loss:MultipleNegativesRankingLoss
+widget:
+- source_sentence: What is the name of the monastery founded by Karma Rolpai Dorje?
+  sentences:
+  - Amid the splendor of this natural beauty stood the monastery called Karma Shar
+    Tsong Ridro, which is a famous place in the religious history of Tibet. It was
+    founded by Karma Rolpai Dorje, the fourth reincarnation of Karmapa, who himself
+    was the first incarnation recognized in Tibet; and it was at this monastery that
+    our great reformer Tsongkhapa was initiated as a monk in the fourteenth century
+    of the Christian era.
+  - In the Year of the Water Bird (1933), Thupten Gyatso, the Thirteenth Dalai Lama,
+    departed from this world. This event left the people of Tibet desolate, as he
+    had done much for the peace and welfare of Tibet. Following his death, the people
+    decided to build a golden mausoleum of special magnificence as a token of their
+    homage and respect, which was erected inside the Potala Palace in Lhasa.
+  - Mr. Nehru's personality had impressed me very much. Although the mantle of Mahatma
+    Gandhi had fallen on him, I could not catch any glimpse of spiritual fervor in
+    him; but I saw him as a brilliant practical statesman, with a masterly grasp of
+    international politics, and he showed me that he had a profound love for his country
+    and faith in his people. For their welfare and progress, he was firm in the pursuit
+    of peace.
+- source_sentence: How did the Dalai Lama describe the period of darkness for Tibetan
+    refugees?
+  sentences:
+  - The Dalai Lama was appalled and filled with consternation upon learning the terms
+    of the agreement. He described the agreement as a mixture of 'Communist clichés,
+    vainglorious assertions which were completely false, and bold statements which
+    were only partly true.' The terms were far worse and more oppressive than anything
+    he had imagined, and he felt that Tibet was expected to 'hand ourselves and our
+    country over to China and cease to exist as a nation.' Despite their strong opposition,
+    they felt helpless and abandoned, with no choice but to acquiesce and submit to
+    the Chinese dictates, hoping that the Chinese would keep their side of the forced,
+    one-sided bargain.
+  - Thus, for almost fifteen years, the Tibetan refugees entered a period of darkness.
+    The prospect of returning to our homeland seemed further off then when we had
+    first come into exile. But of course night is the time for regeneration and during
+    these years the resettlement programme was brought to fruition. Gradually, more
+    and more people were taken off the roads and put into the new settlements around
+    India. Also, a few of the refugees left India to found small communities around
+    the world.
+  - The Dalai Lama felt a sense of loss and nostalgia regarding the Chinese road in
+    Tibet. Although he acknowledged that the road made travel faster and more convenient,
+    he preferred the traditional way of travel. He expressed this sentiment by stating,
+    'It was certainly ten times faster and more convenient, but like all Tibetans,
+    I preferred it as it had always been before.'
+- source_sentence: What reforms did the Dalai Lama establish after the forced resignations
+    of his Prime Ministers?
+  sentences:
+  - The Chinese requisitioned houses, and bought or rented others; and beyond the
+    Ngabo, in the pleasant land beside the river which had always been the favorite
+    place for summer picnics, they took possession of an enormous area for a camp.
+    They demanded a loan of 2000 tons of barley. This huge amount could not be met
+    from the state granaries at that time because of heavy expenditure, and the government
+    had to borrow from monasteries and private owners. Other kinds of food were also
+    demanded, and the humble resources of the city began to be strained, and prices
+    began to rise.
+  - After the forced resignations of his Prime Ministers, the Dalai Lama established
+    the Reform Committee. One of his main ambitions was to establish an independent
+    judiciary. He also focused on education, instructing the Kashag to develop a good
+    educational program. Additionally, he aimed to improve communications by considering
+    the development of a system of roads and transportation. Furthermore, he abolished
+    the principle of hereditary debt and wrote off all government loans that could
+    not be repaid. These reforms were disseminated widely to ensure their implementation.
+  - The Dalai Lama's brother, Taktser Rinpoche, managed to escape to Lhasa by pretending
+    to go along with the Chinese authorities' demands. The Chinese had put him under
+    duress, restricted his activities, and tried to indoctrinate him. They proposed
+    that he would be set free to go to Lhasa if he agreed to persuade the Dalai Lama
+    to accept Chinese rule, and if the Dalai Lama resisted, he was to kill him. Taktser
+    Rinpoche pretended to agree to this plan in order to escape and warn the Dalai
+    Lama and the Tibetan Government of the impending danger from the Chinese. He eventually
+    decided to renounce his monastic vows, disrobe, and go abroad as an emissary for
+    Tibet to seek foreign support against the Chinese invasion.
+- source_sentence: How did Tibet maintain its independence from 1912 to 1950?
+  sentences:
+  - Throughout this period Tibetans never took any active steps to prove their independence
+    to the outside world, because it never seemed to be necessary.
+  - For example, there were now factories where there had been none before, but all
+    that they produced went to China. And the factories themselves were sited with
+    no regard for anything other than utility, with predictably detrimental results
+    to the environment.
+  - In Tantric practices, the chakras and nadis hold significant importance as they
+    are central to the practitioner's ability to control and suppress the grosser
+    levels of consciousness, thereby allowing access to subtler levels. This process
+    is crucial for experiencing profound spiritual realizations, particularly those
+    that occur at the point of death. By meditating on these energy centers and channels,
+    practitioners can demonstrate remarkable physiological phenomena, such as raising
+    body temperatures and reducing oxygen intake, which have been observed and measured
+    in scientific studies.The chakras are described as energy centers, while the nadis
+    are energy channels. The practice of focusing on these elements enables the practitioner
+    to temporarily prevent the activity of grosser levels of consciousness, facilitating
+    the experience of subtler levels. This is aligned with the Buddhist understanding
+    that the most powerful spiritual realizations can occur when the grosser levels
+    of consciousness are suppressed, such as at the moment of death.
+- source_sentence: Who gave the Dalai Lama a lecture before he left Lhasa, and what
+    was it about?
+  sentences:
+  - The settlement of Mangmang held significant importance in the Dalai Lama's journey
+    as it was the last settlement in Tibet before crossing into India. It was here
+    that the Dalai Lama received the crucial news that the Indian government was willing
+    to grant asylum, providing a sense of safety and relief. Despite the harsh weather
+    and his own illness, Mangmang served as a pivotal point where final decisions
+    were made about who would accompany him into India and who would stay behind to
+    continue the fight. The Dalai Lama's departure from Mangmang marked the end of
+    his journey within Tibet and the beginning of his exile.
+  - Before the Dalai Lama left Lhasa, he was given a long lecture by General Chang
+    Chin-wu, the permanent representative of China. The lecture covered several topics,
+    including recent events in Hungary and Poland, the solidarity of socialist powers,
+    the Dalai Lama's visit to India, and specific instructions on how to handle questions
+    about the Indo-Tibetan frontier and the situation in Tibet. General Chang Chin-wu
+    also suggested that the Dalai Lama prepare his speeches in advance.
+  - Everywhere I went, I was accompanied by a retinue of servants. I was surrounded
+    by government ministers and advisors clad in sumptuous silk robes, men drawn from
+    the most exalted and aristocratic families in the land.
+model-index:
+- name: SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
+  results:
+  - task:
+      type: triplet
+      name: Triplet
+    dataset:
+      name: all nli dev
+      type: all-nli-dev
+    metrics:
+    - type: cosine_accuracy
+      value: 0.9923664122137404
+      name: Cosine Accuracy
+    - type: dot_accuracy
+      value: 0.007633587786259542
+      name: Dot Accuracy
+    - type: manhattan_accuracy
+      value: 0.9923664122137404
+      name: Manhattan Accuracy
+    - type: euclidean_accuracy
+      value: 0.989821882951654
+      name: Euclidean Accuracy
+    - type: max_accuracy
+      value: 0.9923664122137404
+      name: Max Accuracy
+---
+# SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) <!-- at revision a0d6174973604c8ef416d9f6ed0f4c17ab32d78d -->
+- **Maximum Sequence Length:** 8192 tokens
+- **Output Dimensionality:** 1024 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
+  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'Who gave the Dalai Lama a lecture before he left Lhasa, and what was it about?',
+    "Before the Dalai Lama left Lhasa, he was given a long lecture by General Chang Chin-wu, the permanent representative of China. The lecture covered several topics, including recent events in Hungary and Poland, the solidarity of socialist powers, the Dalai Lama's visit to India, and specific instructions on how to handle questions about the Indo-Tibetan frontier and the situation in Tibet. General Chang Chin-wu also suggested that the Dalai Lama prepare his speeches in advance.",
+    'Everywhere I went, I was accompanied by a retinue of servants. I was surrounded by government ministers and advisors clad in sumptuous silk robes, men drawn from the most exalted and aristocratic families in the land.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Triplet
+* Dataset: `all-nli-dev`
+* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
+| Metric             | Value      |
+|:-------------------|:-----------|
+| cosine_accuracy    | 0.9924     |
+| dot_accuracy       | 0.0076     |
+| manhattan_accuracy | 0.9924     |
+| euclidean_accuracy | 0.9898     |
+| **max_accuracy**   | **0.9924** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 7,075 training samples
+* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                           | positive                                                                            | negative                                                                           |
+  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                              | string                                                                             |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 17.9 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 96.59 tokens</li><li>max: 810 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 90.43 tokens</li><li>max: 810 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                          | positive                                                                                                                                                                                                                                                                                                                                                                                                              | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+  |:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What was the Dalai Lama's plan for the senior members of the Government if the situation worsened?</code> | <code>Shortly afterwards, with the Chinese consolidating their forces in the east, we decided that I should move to southern Tibet with the most senior members of Government. That way, if the situation deteriorated, I could easily seek exile across the border with India. Meanwhile, Lobsang Tashi and Lukhangwa were to remain in Lhasa in an acting capacity: I would take the seals of state with me.</code> | <code>The Dalai Lama's press conference on 20 June had a significant impact on the international perception of the Tibetan issue. By formally repudiating the Seventeen-Point Agreement and detailing the atrocities committed against Tibetans, the Dalai Lama aimed to present a truthful account of the situation in Tibet. This press conference received wide coverage and helped to counter the Chinese government's narrative. However, despite the extensive media attention, the Dalai Lama acknowledged the challenges in overcoming the Chinese government's efficient public relations campaign and the general reluctance of the international community to face the truth about the situation in Tibet. The press conference marked an important step in raising global awareness about the Tibetan struggle and the injustices faced by its people.</code>                                                                                                                                                                                                                      |
+  | <code>What did the young Dalai Lama enjoy about the opera festival?</code>                                      | <code>They gave their performances on a paved area situated on the far side of, but adjacent to, the Yellow Wall. I myself watched the proceedings from a makeshift enclosure erected on the top of one of the buildings that abutted the wall on the inside.</code>                                                                                                                                                  | <code>This man had become notorious in Lhasa because of his close association with the Chinese occupation forces. Earlier that morning he had attended a daily congregation of monastic officials called the Trungcha Ceremony, and for some unknown reason, about eleven o'clock, he rode towards the Norbulingka on a bicycle, wearing a semi-Chinese dress, dark glasses and a motorcyclist's dust mask, and carrying a pistol unconcealed in his belt. Some of the crowd took him for a Chinese in disguise; others thought he was bringing a message from the Chinese headquarters. Their anger and resentment against everything Chinese suddenly burst into fury, and murder was the tragic result.</code>                                                                                                                                                                                                                                                                                                                                                                              |
+  | <code>What is the Tibetan term "Lama" equivalent to in Indian terminology?</code>                               | <code>Actually, Dalai is a Mongolian word meaning 'ocean' and Lama is a Tibetan term corresponding to the Indian word guru, which denotes a teacher.</code>                                                                                                                                                                                                                                                           | <code>The Chinese authorities handled the issue of Tibetan language and culture with a systematic and ruthless approach aimed at eradicating Tibetan identity. They implemented policies that severely suppressed Tibetan culture and language. For instance, the education provided to Tibetans was primarily conducted in Chinese, with a stated goal of eradicating the Tibetan language within fifteen years. Many schools were essentially labor camps for children, and only a select few Tibetan students received proper education, which was conducted in China to foster 'unity'. Additionally, the Chinese authorities brutally suppressed Tibetan culture by banning formal religion, desecrating thousands of monasteries and nunneries, and enforcing policies that controlled the Tibetan population through measures such as forced abortions and sterilizations. The Chinese also exploited Tibet's natural resources and transformed its economy in ways that primarily benefited China, leaving Tibetans in a state of abject poverty and environmental degradation.</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Evaluation Dataset
+#### Unnamed Dataset
+* Size: 393 evaluation samples
+* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                            | positive                                                                            | negative                                                                            |
+  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                              | string                                                                              |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 18.13 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 99.75 tokens</li><li>max: 810 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 99.99 tokens</li><li>max: 810 tokens</li></ul> |
+* Samples:
+  | anchor                                                                          | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+  |:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What was the role of the Dalai Lama in the feudal system of Tibet?</code> | <code>The Dalai Lama held a unique and central role in the feudal system of Tibet, combining both lay and monastic authority. He had two prime ministers, one a monk and one a layman, and most other offices were duplicated to reflect this dual nature. The Dalai Lama was the ultimate source of justice and was regarded with the highest reverence by the people, who saw him as the incarnation of Chenresi. This reverence ensured that the Dalai Lama could not become an unjust tyrant, providing a final appeal to a source of justice that the people could absolutely trust.</code>                                                                                                           | <code>The Dalai Lama and his companions faced numerous challenges while crossing the high mountains. They had to traverse slippery and muddy tracks, often leading to heights of over 19,000 feet where snow and ice were still present. The journey involved crossing particularly high and steep passes, such as the Yarto Tag-la, where some ponies could not climb the track, necessitating dismounting and leading them. They endured long hours of hard riding and climbing, often becoming very tired and saddle-sore. The weather posed significant difficulties, including snowstorms, snow glare, torrential rain, and strong winds that picked up snow and whirled it into their faces. The cold was intense, numbing their fingers and hands, and causing ice to form on their eyebrows and moustaches. Additionally, they had to deal with the threat of being spotted by Chinese aircraft, which added to their unease and forced them to divide into smaller parties. The journey was further complicated by a duststorm and the glare from the snow, which was particularly hard on those without goggles. Finally, the weather did its worst when they reached Mangmang, where they experienced heavy rain that leaked into their tents, causing discomfort and illness.</code> |
+  | <code>What was the Dalai Lama's impression of Prime Minister Shastri?</code>    | <code>The Dalai Lama held Prime Minister Lal Bahadur Shastri in high regard, respecting him greatly. He appreciated Shastri's friendship and political support for the Tibetan refugees, noting that Shastri was even more of a political ally than Nehru. The Dalai Lama admired Shastri's powerful mind and spirit, describing him as a bold and decisive leader despite his frail appearance. Shastri's compassion and strict vegetarianism, stemming from a childhood incident, also left a lasting impression on the Dalai Lama. The Dalai Lama mourned Shastri's death deeply, recognizing the loss of a true and mighty friend, an enlightened leader, and a genuinely compassionate spirit.</code> | <code>The Dalai Lama's initial impression of the Chinese general's appearance was that he looked extremely drab and insignificant among the splendid figures of his own officials. The Dalai Lama observed the general and his aides in gray suits and peaked caps, which contrasted sharply with the red and golden robes of the Tibetan officials. This drabness, as the Dalai Lama later reflected, was indicative of the state to which China would reduce Tibet. However, the general turned out to be friendly and informal during their meeting.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+  | <code>What were the names of the two Lhasa Apso dogs?</code>                    | <code>The names of the two Lhasa Apso dogs were Sangye and Tashi.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | <code>The Dalai Lama's journey was marked by challenging weather conditions. During the journey, they faced an 'extraordinary sequence of snowstorms, snow glare, and torrential rain.' At one point, while crossing the Lagoe-la pass, they encountered a 'heavy storm' which made it 'very cold,' numbing their fingers and hands, and freezing their eyebrows. Additionally, they experienced a duststorm and intense snow glare. The weather did its worst when they reached Mangmang, where it 'began to pour with rain,' causing leaks in the tents and resulting in a sleepless night for many, including the Dalai Lama, who felt very ill the next morning.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.1
+- `fp16`: True
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 8
+- `per_device_eval_batch_size`: 8
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch  | Step | Training Loss | loss   | all-nli-dev_max_accuracy |
+|:------:|:----:|:-------------:|:------:|:------------------------:|
+| 0      | 0    | -             | -      | 0.8830                   |
+| 0.0565 | 50   | 0.7484        | 0.2587 | 0.9873                   |
+| 0.1130 | 100  | 0.2822        | 0.2313 | 0.9898                   |
+| 0.1695 | 150  | 0.3023        | 0.2291 | 0.9873                   |
+| 0.2260 | 200  | 0.2484        | 0.2155 | 0.9873                   |
+| 0.2825 | 250  | 0.2909        | 0.1965 | 0.9847                   |
+| 0.3390 | 300  | 0.2999        | 0.2008 | 0.9847                   |
+| 0.3955 | 350  | 0.2586        | 0.1670 | 0.9924                   |
+| 0.4520 | 400  | 0.2385        | 0.1467 | 0.9898                   |
+| 0.5085 | 450  | 0.2353        | 0.1311 | 0.9898                   |
+| 0.5650 | 500  | 0.2632        | 0.1340 | 0.9873                   |
+| 0.6215 | 550  | 0.3793        | 0.1218 | 0.9898                   |
+| 0.6780 | 600  | 0.1978        | 0.1174 | 0.9898                   |
+| 0.7345 | 650  | 0.179         | 0.1254 | 0.9898                   |
+| 0.7910 | 700  | 0.1326        | 0.1142 | 0.9924                   |
+| 0.8475 | 750  | 0.1842        | 0.1153 | 0.9924                   |
+### Framework Versions
+- Python: 3.10.13
+- Sentence Transformers: 3.0.1
+- Transformers: 4.41.2
+- PyTorch: 2.2.1
+- Accelerate: 0.31.0
+- Datasets: 2.20.0
+- Tokenizers: 0.19.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "_name_or_path": "Alibaba-NLP/gte-large-en-v1.5",
+  "architectures": [
+    "NewModel"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
+    "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
+    "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
+    "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
+    "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
+    "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
+    "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
+  },
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-12,
+  "layer_norm_type": "layer_norm",
+  "logn_attention_clip1": false,
+  "logn_attention_scale": false,
+  "max_position_embeddings": 8192,
+  "model_type": "new",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pack_qkv": true,
+  "pad_token_id": 0,
+  "position_embedding_type": "rope",
+  "rope_scaling": {
+    "factor": 2.0,
+    "type": "ntk"
+  },
+  "rope_theta": 160000,
+  "torch_dtype": "float32",
+  "transformers_version": "4.41.2",
+  "type_vocab_size": 2,
+  "unpad_inputs": false,
+  "use_memory_efficient_attention": false,
+  "vocab_size": 30528
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.0.1",
+    "transformers": "4.41.2",
+    "pytorch": "2.2.1"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70375cf86ce215a2102feb4b304ed36991ea82875c75c28b88f81631f1520b43
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a430b8cd017dc17602afe4c768fe0a796a958cb2e98f341d0153a152c77d1beb
+size 1064

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 8192,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,62 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 8000,
+  "model_max_length": 8192,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,333 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.847457627118644,
+  "eval_steps": 50,
+  "global_step": 750,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05649717514124294,
+      "grad_norm": 12.016854286193848,
+      "learning_rate": 1.0786516853932584e-05,
+      "loss": 0.7484,
+      "step": 50
+    },
+    {
+      "epoch": 0.05649717514124294,
+      "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9821882951653944,
+      "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
+      "eval_loss": 0.25870630145072937,
+      "eval_runtime": 13.2055,
+      "eval_samples_per_second": 29.76,
+      "eval_steps_per_second": 3.786,
+      "step": 50
+    },
+    {
+      "epoch": 0.11299435028248588,
+      "grad_norm": 16.70142936706543,
+      "learning_rate": 1.977386934673367e-05,
+      "loss": 0.2822,
+      "step": 100
+    },
+    {
+      "epoch": 0.11299435028248588,
+      "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.23130479454994202,
+      "eval_runtime": 13.1695,
+      "eval_samples_per_second": 29.842,
+      "eval_steps_per_second": 3.797,
+      "step": 100
+    },
+    {
+      "epoch": 0.1694915254237288,
+      "grad_norm": 5.665423393249512,
+      "learning_rate": 1.8542713567839195e-05,
+      "loss": 0.3023,
+      "step": 150
+    },
+    {
+      "epoch": 0.1694915254237288,
+      "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
+      "eval_loss": 0.22914756834506989,
+      "eval_runtime": 13.1614,
+      "eval_samples_per_second": 29.86,
+      "eval_steps_per_second": 3.799,
+      "step": 150
+    },
+    {
+      "epoch": 0.22598870056497175,
+      "grad_norm": 0.9494842886924744,
+      "learning_rate": 1.7311557788944723e-05,
+      "loss": 0.2484,
+      "step": 200
+    },
+    {
+      "epoch": 0.22598870056497175,
+      "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
+      "eval_loss": 0.21549050509929657,
+      "eval_runtime": 13.4649,
+      "eval_samples_per_second": 29.187,
+      "eval_steps_per_second": 3.713,
+      "step": 200
+    },
+    {
+      "epoch": 0.2824858757062147,
+      "grad_norm": 5.140994071960449,
+      "learning_rate": 1.6055276381909547e-05,
+      "loss": 0.2909,
+      "step": 250
+    },
+    {
+      "epoch": 0.2824858757062147,
+      "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9821882951653944,
+      "eval_all-nli-dev_max_accuracy": 0.9847328244274809,
+      "eval_loss": 0.19652578234672546,
+      "eval_runtime": 13.5616,
+      "eval_samples_per_second": 28.979,
+      "eval_steps_per_second": 3.687,
+      "step": 250
+    },
+    {
+      "epoch": 0.3389830508474576,
+      "grad_norm": 18.279523849487305,
+      "learning_rate": 1.4824120603015077e-05,
+      "loss": 0.2999,
+      "step": 300
+    },
+    {
+      "epoch": 0.3389830508474576,
+      "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_max_accuracy": 0.9847328244274809,
+      "eval_loss": 0.20084014534950256,
+      "eval_runtime": 13.2009,
+      "eval_samples_per_second": 29.771,
+      "eval_steps_per_second": 3.788,
+      "step": 300
+    },
+    {
+      "epoch": 0.3954802259887006,
+      "grad_norm": 4.213184833526611,
+      "learning_rate": 1.3567839195979901e-05,
+      "loss": 0.2586,
+      "step": 350
+    },
+    {
+      "epoch": 0.3954802259887006,
+      "eval_all-nli-dev_cosine_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_dot_accuracy": 0.007633587786259542,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
+      "eval_loss": 0.16702787578105927,
+      "eval_runtime": 13.3509,
+      "eval_samples_per_second": 29.436,
+      "eval_steps_per_second": 3.745,
+      "step": 350
+    },
+    {
+      "epoch": 0.4519774011299435,
+      "grad_norm": 30.387929916381836,
+      "learning_rate": 1.2336683417085429e-05,
+      "loss": 0.2385,
+      "step": 400
+    },
+    {
+      "epoch": 0.4519774011299435,
+      "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.14671088755130768,
+      "eval_runtime": 13.2819,
+      "eval_samples_per_second": 29.589,
+      "eval_steps_per_second": 3.765,
+      "step": 400
+    },
+    {
+      "epoch": 0.5084745762711864,
+      "grad_norm": 3.245051860809326,
+      "learning_rate": 1.1080402010050253e-05,
+      "loss": 0.2353,
+      "step": 450
+    },
+    {
+      "epoch": 0.5084745762711864,
+      "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.13109469413757324,
+      "eval_runtime": 13.4569,
+      "eval_samples_per_second": 29.204,
+      "eval_steps_per_second": 3.716,
+      "step": 450
+    },
+    {
+      "epoch": 0.5649717514124294,
+      "grad_norm": 32.116214752197266,
+      "learning_rate": 9.824120603015075e-06,
+      "loss": 0.2632,
+      "step": 500
+    },
+    {
+      "epoch": 0.5649717514124294,
+      "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
+      "eval_loss": 0.13404284417629242,
+      "eval_runtime": 13.1995,
+      "eval_samples_per_second": 29.774,
+      "eval_steps_per_second": 3.788,
+      "step": 500
+    },
+    {
+      "epoch": 0.6214689265536724,
+      "grad_norm": 33.70884704589844,
+      "learning_rate": 8.5678391959799e-06,
+      "loss": 0.3793,
+      "step": 550
+    },
+    {
+      "epoch": 0.6214689265536724,
+      "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.12181754410266876,
+      "eval_runtime": 13.194,
+      "eval_samples_per_second": 29.786,
+      "eval_steps_per_second": 3.79,
+      "step": 550
+    },
+    {
+      "epoch": 0.6779661016949152,
+      "grad_norm": 3.5105509757995605,
+      "learning_rate": 7.311557788944724e-06,
+      "loss": 0.1978,
+      "step": 600
+    },
+    {
+      "epoch": 0.6779661016949152,
+      "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.11738275736570358,
+      "eval_runtime": 13.3865,
+      "eval_samples_per_second": 29.358,
+      "eval_steps_per_second": 3.735,
+      "step": 600
+    },
+    {
+      "epoch": 0.7344632768361582,
+      "grad_norm": 8.318052291870117,
+      "learning_rate": 6.055276381909548e-06,
+      "loss": 0.179,
+      "step": 650
+    },
+    {
+      "epoch": 0.7344632768361582,
+      "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
+      "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
+      "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_max_accuracy": 0.989821882951654,
+      "eval_loss": 0.12535005807876587,
+      "eval_runtime": 13.465,
+      "eval_samples_per_second": 29.187,
+      "eval_steps_per_second": 3.713,
+      "step": 650
+    },
+    {
+      "epoch": 0.7909604519774012,
+      "grad_norm": 26.912717819213867,
+      "learning_rate": 4.7989949748743725e-06,
+      "loss": 0.1326,
+      "step": 700
+    },
+    {
+      "epoch": 0.7909604519774012,
+      "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
+      "eval_loss": 0.11420778185129166,
+      "eval_runtime": 13.2565,
+      "eval_samples_per_second": 29.646,
+      "eval_steps_per_second": 3.772,
+      "step": 700
+    },
+    {
+      "epoch": 0.847457627118644,
+      "grad_norm": 12.022055625915527,
+      "learning_rate": 3.542713567839196e-06,
+      "loss": 0.1842,
+      "step": 750
+    },
+    {
+      "epoch": 0.847457627118644,
+      "eval_all-nli-dev_cosine_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_dot_accuracy": 0.007633587786259542,
+      "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
+      "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
+      "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
+      "eval_loss": 0.11530788987874985,
+      "eval_runtime": 13.2795,
+      "eval_samples_per_second": 29.595,
+      "eval_steps_per_second": 3.765,
+      "step": 750
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 885,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29c3c2e1664d49d6e43d6889c04d8ce3a9b29b4c30f604c9032e7a794d32831d
+size 5304

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff