tenzin3 commited on
Commit
17f9526
·
1 Parent(s): 37b0492

embedding model smaller files

Browse files
1_Pooling/.ipynb_checkpoints/config-checkpoint.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,527 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Alibaba-NLP/gte-large-en-v1.5
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - cosine_accuracy
8
+ - dot_accuracy
9
+ - manhattan_accuracy
10
+ - euclidean_accuracy
11
+ - max_accuracy
12
+ pipeline_tag: sentence-similarity
13
+ tags:
14
+ - sentence-transformers
15
+ - sentence-similarity
16
+ - feature-extraction
17
+ - generated_from_trainer
18
+ - dataset_size:7075
19
+ - loss:MultipleNegativesRankingLoss
20
+ widget:
21
+ - source_sentence: What is the name of the monastery founded by Karma Rolpai Dorje?
22
+ sentences:
23
+ - Amid the splendor of this natural beauty stood the monastery called Karma Shar
24
+ Tsong Ridro, which is a famous place in the religious history of Tibet. It was
25
+ founded by Karma Rolpai Dorje, the fourth reincarnation of Karmapa, who himself
26
+ was the first incarnation recognized in Tibet; and it was at this monastery that
27
+ our great reformer Tsongkhapa was initiated as a monk in the fourteenth century
28
+ of the Christian era.
29
+ - In the Year of the Water Bird (1933), Thupten Gyatso, the Thirteenth Dalai Lama,
30
+ departed from this world. This event left the people of Tibet desolate, as he
31
+ had done much for the peace and welfare of Tibet. Following his death, the people
32
+ decided to build a golden mausoleum of special magnificence as a token of their
33
+ homage and respect, which was erected inside the Potala Palace in Lhasa.
34
+ - Mr. Nehru's personality had impressed me very much. Although the mantle of Mahatma
35
+ Gandhi had fallen on him, I could not catch any glimpse of spiritual fervor in
36
+ him; but I saw him as a brilliant practical statesman, with a masterly grasp of
37
+ international politics, and he showed me that he had a profound love for his country
38
+ and faith in his people. For their welfare and progress, he was firm in the pursuit
39
+ of peace.
40
+ - source_sentence: How did the Dalai Lama describe the period of darkness for Tibetan
41
+ refugees?
42
+ sentences:
43
+ - The Dalai Lama was appalled and filled with consternation upon learning the terms
44
+ of the agreement. He described the agreement as a mixture of 'Communist clichés,
45
+ vainglorious assertions which were completely false, and bold statements which
46
+ were only partly true.' The terms were far worse and more oppressive than anything
47
+ he had imagined, and he felt that Tibet was expected to 'hand ourselves and our
48
+ country over to China and cease to exist as a nation.' Despite their strong opposition,
49
+ they felt helpless and abandoned, with no choice but to acquiesce and submit to
50
+ the Chinese dictates, hoping that the Chinese would keep their side of the forced,
51
+ one-sided bargain.
52
+ - Thus, for almost fifteen years, the Tibetan refugees entered a period of darkness.
53
+ The prospect of returning to our homeland seemed further off then when we had
54
+ first come into exile. But of course night is the time for regeneration and during
55
+ these years the resettlement programme was brought to fruition. Gradually, more
56
+ and more people were taken off the roads and put into the new settlements around
57
+ India. Also, a few of the refugees left India to found small communities around
58
+ the world.
59
+ - The Dalai Lama felt a sense of loss and nostalgia regarding the Chinese road in
60
+ Tibet. Although he acknowledged that the road made travel faster and more convenient,
61
+ he preferred the traditional way of travel. He expressed this sentiment by stating,
62
+ 'It was certainly ten times faster and more convenient, but like all Tibetans,
63
+ I preferred it as it had always been before.'
64
+ - source_sentence: What reforms did the Dalai Lama establish after the forced resignations
65
+ of his Prime Ministers?
66
+ sentences:
67
+ - The Chinese requisitioned houses, and bought or rented others; and beyond the
68
+ Ngabo, in the pleasant land beside the river which had always been the favorite
69
+ place for summer picnics, they took possession of an enormous area for a camp.
70
+ They demanded a loan of 2000 tons of barley. This huge amount could not be met
71
+ from the state granaries at that time because of heavy expenditure, and the government
72
+ had to borrow from monasteries and private owners. Other kinds of food were also
73
+ demanded, and the humble resources of the city began to be strained, and prices
74
+ began to rise.
75
+ - After the forced resignations of his Prime Ministers, the Dalai Lama established
76
+ the Reform Committee. One of his main ambitions was to establish an independent
77
+ judiciary. He also focused on education, instructing the Kashag to develop a good
78
+ educational program. Additionally, he aimed to improve communications by considering
79
+ the development of a system of roads and transportation. Furthermore, he abolished
80
+ the principle of hereditary debt and wrote off all government loans that could
81
+ not be repaid. These reforms were disseminated widely to ensure their implementation.
82
+ - The Dalai Lama's brother, Taktser Rinpoche, managed to escape to Lhasa by pretending
83
+ to go along with the Chinese authorities' demands. The Chinese had put him under
84
+ duress, restricted his activities, and tried to indoctrinate him. They proposed
85
+ that he would be set free to go to Lhasa if he agreed to persuade the Dalai Lama
86
+ to accept Chinese rule, and if the Dalai Lama resisted, he was to kill him. Taktser
87
+ Rinpoche pretended to agree to this plan in order to escape and warn the Dalai
88
+ Lama and the Tibetan Government of the impending danger from the Chinese. He eventually
89
+ decided to renounce his monastic vows, disrobe, and go abroad as an emissary for
90
+ Tibet to seek foreign support against the Chinese invasion.
91
+ - source_sentence: How did Tibet maintain its independence from 1912 to 1950?
92
+ sentences:
93
+ - Throughout this period Tibetans never took any active steps to prove their independence
94
+ to the outside world, because it never seemed to be necessary.
95
+ - For example, there were now factories where there had been none before, but all
96
+ that they produced went to China. And the factories themselves were sited with
97
+ no regard for anything other than utility, with predictably detrimental results
98
+ to the environment.
99
+ - In Tantric practices, the chakras and nadis hold significant importance as they
100
+ are central to the practitioner's ability to control and suppress the grosser
101
+ levels of consciousness, thereby allowing access to subtler levels. This process
102
+ is crucial for experiencing profound spiritual realizations, particularly those
103
+ that occur at the point of death. By meditating on these energy centers and channels,
104
+ practitioners can demonstrate remarkable physiological phenomena, such as raising
105
+ body temperatures and reducing oxygen intake, which have been observed and measured
106
+ in scientific studies.The chakras are described as energy centers, while the nadis
107
+ are energy channels. The practice of focusing on these elements enables the practitioner
108
+ to temporarily prevent the activity of grosser levels of consciousness, facilitating
109
+ the experience of subtler levels. This is aligned with the Buddhist understanding
110
+ that the most powerful spiritual realizations can occur when the grosser levels
111
+ of consciousness are suppressed, such as at the moment of death.
112
+ - source_sentence: Who gave the Dalai Lama a lecture before he left Lhasa, and what
113
+ was it about?
114
+ sentences:
115
+ - The settlement of Mangmang held significant importance in the Dalai Lama's journey
116
+ as it was the last settlement in Tibet before crossing into India. It was here
117
+ that the Dalai Lama received the crucial news that the Indian government was willing
118
+ to grant asylum, providing a sense of safety and relief. Despite the harsh weather
119
+ and his own illness, Mangmang served as a pivotal point where final decisions
120
+ were made about who would accompany him into India and who would stay behind to
121
+ continue the fight. The Dalai Lama's departure from Mangmang marked the end of
122
+ his journey within Tibet and the beginning of his exile.
123
+ - Before the Dalai Lama left Lhasa, he was given a long lecture by General Chang
124
+ Chin-wu, the permanent representative of China. The lecture covered several topics,
125
+ including recent events in Hungary and Poland, the solidarity of socialist powers,
126
+ the Dalai Lama's visit to India, and specific instructions on how to handle questions
127
+ about the Indo-Tibetan frontier and the situation in Tibet. General Chang Chin-wu
128
+ also suggested that the Dalai Lama prepare his speeches in advance.
129
+ - Everywhere I went, I was accompanied by a retinue of servants. I was surrounded
130
+ by government ministers and advisors clad in sumptuous silk robes, men drawn from
131
+ the most exalted and aristocratic families in the land.
132
+ model-index:
133
+ - name: SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
134
+ results:
135
+ - task:
136
+ type: triplet
137
+ name: Triplet
138
+ dataset:
139
+ name: all nli dev
140
+ type: all-nli-dev
141
+ metrics:
142
+ - type: cosine_accuracy
143
+ value: 0.9923664122137404
144
+ name: Cosine Accuracy
145
+ - type: dot_accuracy
146
+ value: 0.007633587786259542
147
+ name: Dot Accuracy
148
+ - type: manhattan_accuracy
149
+ value: 0.9923664122137404
150
+ name: Manhattan Accuracy
151
+ - type: euclidean_accuracy
152
+ value: 0.989821882951654
153
+ name: Euclidean Accuracy
154
+ - type: max_accuracy
155
+ value: 0.9923664122137404
156
+ name: Max Accuracy
157
+ ---
158
+
159
+ # SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
160
+
161
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
162
+
163
+ ## Model Details
164
+
165
+ ### Model Description
166
+ - **Model Type:** Sentence Transformer
167
+ - **Base model:** [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) <!-- at revision a0d6174973604c8ef416d9f6ed0f4c17ab32d78d -->
168
+ - **Maximum Sequence Length:** 8192 tokens
169
+ - **Output Dimensionality:** 1024 tokens
170
+ - **Similarity Function:** Cosine Similarity
171
+ <!-- - **Training Dataset:** Unknown -->
172
+ <!-- - **Language:** Unknown -->
173
+ <!-- - **License:** Unknown -->
174
+
175
+ ### Model Sources
176
+
177
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
178
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
179
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
180
+
181
+ ### Full Model Architecture
182
+
183
+ ```
184
+ SentenceTransformer(
185
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
186
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
187
+ )
188
+ ```
189
+
190
+ ## Usage
191
+
192
+ ### Direct Usage (Sentence Transformers)
193
+
194
+ First install the Sentence Transformers library:
195
+
196
+ ```bash
197
+ pip install -U sentence-transformers
198
+ ```
199
+
200
+ Then you can load this model and run inference.
201
+ ```python
202
+ from sentence_transformers import SentenceTransformer
203
+
204
+ # Download from the 🤗 Hub
205
+ model = SentenceTransformer("sentence_transformers_model_id")
206
+ # Run inference
207
+ sentences = [
208
+ 'Who gave the Dalai Lama a lecture before he left Lhasa, and what was it about?',
209
+ "Before the Dalai Lama left Lhasa, he was given a long lecture by General Chang Chin-wu, the permanent representative of China. The lecture covered several topics, including recent events in Hungary and Poland, the solidarity of socialist powers, the Dalai Lama's visit to India, and specific instructions on how to handle questions about the Indo-Tibetan frontier and the situation in Tibet. General Chang Chin-wu also suggested that the Dalai Lama prepare his speeches in advance.",
210
+ 'Everywhere I went, I was accompanied by a retinue of servants. I was surrounded by government ministers and advisors clad in sumptuous silk robes, men drawn from the most exalted and aristocratic families in the land.',
211
+ ]
212
+ embeddings = model.encode(sentences)
213
+ print(embeddings.shape)
214
+ # [3, 1024]
215
+
216
+ # Get the similarity scores for the embeddings
217
+ similarities = model.similarity(embeddings, embeddings)
218
+ print(similarities.shape)
219
+ # [3, 3]
220
+ ```
221
+
222
+ <!--
223
+ ### Direct Usage (Transformers)
224
+
225
+ <details><summary>Click to see the direct usage in Transformers</summary>
226
+
227
+ </details>
228
+ -->
229
+
230
+ <!--
231
+ ### Downstream Usage (Sentence Transformers)
232
+
233
+ You can finetune this model on your own dataset.
234
+
235
+ <details><summary>Click to expand</summary>
236
+
237
+ </details>
238
+ -->
239
+
240
+ <!--
241
+ ### Out-of-Scope Use
242
+
243
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
244
+ -->
245
+
246
+ ## Evaluation
247
+
248
+ ### Metrics
249
+
250
+ #### Triplet
251
+ * Dataset: `all-nli-dev`
252
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
253
+
254
+ | Metric | Value |
255
+ |:-------------------|:-----------|
256
+ | cosine_accuracy | 0.9924 |
257
+ | dot_accuracy | 0.0076 |
258
+ | manhattan_accuracy | 0.9924 |
259
+ | euclidean_accuracy | 0.9898 |
260
+ | **max_accuracy** | **0.9924** |
261
+
262
+ <!--
263
+ ## Bias, Risks and Limitations
264
+
265
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
266
+ -->
267
+
268
+ <!--
269
+ ### Recommendations
270
+
271
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
272
+ -->
273
+
274
+ ## Training Details
275
+
276
+ ### Training Dataset
277
+
278
+ #### Unnamed Dataset
279
+
280
+
281
+ * Size: 7,075 training samples
282
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
283
+ * Approximate statistics based on the first 1000 samples:
284
+ | | anchor | positive | negative |
285
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
286
+ | type | string | string | string |
287
+ | details | <ul><li>min: 6 tokens</li><li>mean: 17.9 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 96.59 tokens</li><li>max: 810 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 90.43 tokens</li><li>max: 810 tokens</li></ul> |
288
+ * Samples:
289
+ | anchor | positive | negative |
290
+ |:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
291
+ | <code>What was the Dalai Lama's plan for the senior members of the Government if the situation worsened?</code> | <code>Shortly afterwards, with the Chinese consolidating their forces in the east, we decided that I should move to southern Tibet with the most senior members of Government. That way, if the situation deteriorated, I could easily seek exile across the border with India. Meanwhile, Lobsang Tashi and Lukhangwa were to remain in Lhasa in an acting capacity: I would take the seals of state with me.</code> | <code>The Dalai Lama's press conference on 20 June had a significant impact on the international perception of the Tibetan issue. By formally repudiating the Seventeen-Point Agreement and detailing the atrocities committed against Tibetans, the Dalai Lama aimed to present a truthful account of the situation in Tibet. This press conference received wide coverage and helped to counter the Chinese government's narrative. However, despite the extensive media attention, the Dalai Lama acknowledged the challenges in overcoming the Chinese government's efficient public relations campaign and the general reluctance of the international community to face the truth about the situation in Tibet. The press conference marked an important step in raising global awareness about the Tibetan struggle and the injustices faced by its people.</code> |
292
+ | <code>What did the young Dalai Lama enjoy about the opera festival?</code> | <code>They gave their performances on a paved area situated on the far side of, but adjacent to, the Yellow Wall. I myself watched the proceedings from a makeshift enclosure erected on the top of one of the buildings that abutted the wall on the inside.</code> | <code>This man had become notorious in Lhasa because of his close association with the Chinese occupation forces. Earlier that morning he had attended a daily congregation of monastic officials called the Trungcha Ceremony, and for some unknown reason, about eleven o'clock, he rode towards the Norbulingka on a bicycle, wearing a semi-Chinese dress, dark glasses and a motorcyclist's dust mask, and carrying a pistol unconcealed in his belt. Some of the crowd took him for a Chinese in disguise; others thought he was bringing a message from the Chinese headquarters. Their anger and resentment against everything Chinese suddenly burst into fury, and murder was the tragic result.</code> |
293
+ | <code>What is the Tibetan term "Lama" equivalent to in Indian terminology?</code> | <code>Actually, Dalai is a Mongolian word meaning 'ocean' and Lama is a Tibetan term corresponding to the Indian word guru, which denotes a teacher.</code> | <code>The Chinese authorities handled the issue of Tibetan language and culture with a systematic and ruthless approach aimed at eradicating Tibetan identity. They implemented policies that severely suppressed Tibetan culture and language. For instance, the education provided to Tibetans was primarily conducted in Chinese, with a stated goal of eradicating the Tibetan language within fifteen years. Many schools were essentially labor camps for children, and only a select few Tibetan students received proper education, which was conducted in China to foster 'unity'. Additionally, the Chinese authorities brutally suppressed Tibetan culture by banning formal religion, desecrating thousands of monasteries and nunneries, and enforcing policies that controlled the Tibetan population through measures such as forced abortions and sterilizations. The Chinese also exploited Tibet's natural resources and transformed its economy in ways that primarily benefited China, leaving Tibetans in a state of abject poverty and environmental degradation.</code> |
294
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
295
+ ```json
296
+ {
297
+ "scale": 20.0,
298
+ "similarity_fct": "cos_sim"
299
+ }
300
+ ```
301
+
302
+ ### Evaluation Dataset
303
+
304
+ #### Unnamed Dataset
305
+
306
+
307
+ * Size: 393 evaluation samples
308
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
309
+ * Approximate statistics based on the first 1000 samples:
310
+ | | anchor | positive | negative |
311
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
312
+ | type | string | string | string |
313
+ | details | <ul><li>min: 6 tokens</li><li>mean: 18.13 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 99.75 tokens</li><li>max: 810 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 99.99 tokens</li><li>max: 810 tokens</li></ul> |
314
+ * Samples:
315
+ | anchor | positive | negative |
316
+ |:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
317
+ | <code>What was the role of the Dalai Lama in the feudal system of Tibet?</code> | <code>The Dalai Lama held a unique and central role in the feudal system of Tibet, combining both lay and monastic authority. He had two prime ministers, one a monk and one a layman, and most other offices were duplicated to reflect this dual nature. The Dalai Lama was the ultimate source of justice and was regarded with the highest reverence by the people, who saw him as the incarnation of Chenresi. This reverence ensured that the Dalai Lama could not become an unjust tyrant, providing a final appeal to a source of justice that the people could absolutely trust.</code> | <code>The Dalai Lama and his companions faced numerous challenges while crossing the high mountains. They had to traverse slippery and muddy tracks, often leading to heights of over 19,000 feet where snow and ice were still present. The journey involved crossing particularly high and steep passes, such as the Yarto Tag-la, where some ponies could not climb the track, necessitating dismounting and leading them. They endured long hours of hard riding and climbing, often becoming very tired and saddle-sore. The weather posed significant difficulties, including snowstorms, snow glare, torrential rain, and strong winds that picked up snow and whirled it into their faces. The cold was intense, numbing their fingers and hands, and causing ice to form on their eyebrows and moustaches. Additionally, they had to deal with the threat of being spotted by Chinese aircraft, which added to their unease and forced them to divide into smaller parties. The journey was further complicated by a duststorm and the glare from the snow, which was particularly hard on those without goggles. Finally, the weather did its worst when they reached Mangmang, where they experienced heavy rain that leaked into their tents, causing discomfort and illness.</code> |
318
+ | <code>What was the Dalai Lama's impression of Prime Minister Shastri?</code> | <code>The Dalai Lama held Prime Minister Lal Bahadur Shastri in high regard, respecting him greatly. He appreciated Shastri's friendship and political support for the Tibetan refugees, noting that Shastri was even more of a political ally than Nehru. The Dalai Lama admired Shastri's powerful mind and spirit, describing him as a bold and decisive leader despite his frail appearance. Shastri's compassion and strict vegetarianism, stemming from a childhood incident, also left a lasting impression on the Dalai Lama. The Dalai Lama mourned Shastri's death deeply, recognizing the loss of a true and mighty friend, an enlightened leader, and a genuinely compassionate spirit.</code> | <code>The Dalai Lama's initial impression of the Chinese general's appearance was that he looked extremely drab and insignificant among the splendid figures of his own officials. The Dalai Lama observed the general and his aides in gray suits and peaked caps, which contrasted sharply with the red and golden robes of the Tibetan officials. This drabness, as the Dalai Lama later reflected, was indicative of the state to which China would reduce Tibet. However, the general turned out to be friendly and informal during their meeting.</code> |
319
+ | <code>What were the names of the two Lhasa Apso dogs?</code> | <code>The names of the two Lhasa Apso dogs were Sangye and Tashi.</code> | <code>The Dalai Lama's journey was marked by challenging weather conditions. During the journey, they faced an 'extraordinary sequence of snowstorms, snow glare, and torrential rain.' At one point, while crossing the Lagoe-la pass, they encountered a 'heavy storm' which made it 'very cold,' numbing their fingers and hands, and freezing their eyebrows. Additionally, they experienced a duststorm and intense snow glare. The weather did its worst when they reached Mangmang, where it 'began to pour with rain,' causing leaks in the tents and resulting in a sleepless night for many, including the Dalai Lama, who felt very ill the next morning.</code> |
320
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
321
+ ```json
322
+ {
323
+ "scale": 20.0,
324
+ "similarity_fct": "cos_sim"
325
+ }
326
+ ```
327
+
328
+ ### Training Hyperparameters
329
+ #### Non-Default Hyperparameters
330
+
331
+ - `eval_strategy`: steps
332
+ - `learning_rate`: 2e-05
333
+ - `num_train_epochs`: 1
334
+ - `warmup_ratio`: 0.1
335
+ - `fp16`: True
336
+ - `batch_sampler`: no_duplicates
337
+
338
+ #### All Hyperparameters
339
+ <details><summary>Click to expand</summary>
340
+
341
+ - `overwrite_output_dir`: False
342
+ - `do_predict`: False
343
+ - `eval_strategy`: steps
344
+ - `prediction_loss_only`: True
345
+ - `per_device_train_batch_size`: 8
346
+ - `per_device_eval_batch_size`: 8
347
+ - `per_gpu_train_batch_size`: None
348
+ - `per_gpu_eval_batch_size`: None
349
+ - `gradient_accumulation_steps`: 1
350
+ - `eval_accumulation_steps`: None
351
+ - `learning_rate`: 2e-05
352
+ - `weight_decay`: 0.0
353
+ - `adam_beta1`: 0.9
354
+ - `adam_beta2`: 0.999
355
+ - `adam_epsilon`: 1e-08
356
+ - `max_grad_norm`: 1.0
357
+ - `num_train_epochs`: 1
358
+ - `max_steps`: -1
359
+ - `lr_scheduler_type`: linear
360
+ - `lr_scheduler_kwargs`: {}
361
+ - `warmup_ratio`: 0.1
362
+ - `warmup_steps`: 0
363
+ - `log_level`: passive
364
+ - `log_level_replica`: warning
365
+ - `log_on_each_node`: True
366
+ - `logging_nan_inf_filter`: True
367
+ - `save_safetensors`: True
368
+ - `save_on_each_node`: False
369
+ - `save_only_model`: False
370
+ - `restore_callback_states_from_checkpoint`: False
371
+ - `no_cuda`: False
372
+ - `use_cpu`: False
373
+ - `use_mps_device`: False
374
+ - `seed`: 42
375
+ - `data_seed`: None
376
+ - `jit_mode_eval`: False
377
+ - `use_ipex`: False
378
+ - `bf16`: False
379
+ - `fp16`: True
380
+ - `fp16_opt_level`: O1
381
+ - `half_precision_backend`: auto
382
+ - `bf16_full_eval`: False
383
+ - `fp16_full_eval`: False
384
+ - `tf32`: None
385
+ - `local_rank`: 0
386
+ - `ddp_backend`: None
387
+ - `tpu_num_cores`: None
388
+ - `tpu_metrics_debug`: False
389
+ - `debug`: []
390
+ - `dataloader_drop_last`: False
391
+ - `dataloader_num_workers`: 0
392
+ - `dataloader_prefetch_factor`: None
393
+ - `past_index`: -1
394
+ - `disable_tqdm`: False
395
+ - `remove_unused_columns`: True
396
+ - `label_names`: None
397
+ - `load_best_model_at_end`: False
398
+ - `ignore_data_skip`: False
399
+ - `fsdp`: []
400
+ - `fsdp_min_num_params`: 0
401
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
402
+ - `fsdp_transformer_layer_cls_to_wrap`: None
403
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
404
+ - `deepspeed`: None
405
+ - `label_smoothing_factor`: 0.0
406
+ - `optim`: adamw_torch
407
+ - `optim_args`: None
408
+ - `adafactor`: False
409
+ - `group_by_length`: False
410
+ - `length_column_name`: length
411
+ - `ddp_find_unused_parameters`: None
412
+ - `ddp_bucket_cap_mb`: None
413
+ - `ddp_broadcast_buffers`: False
414
+ - `dataloader_pin_memory`: True
415
+ - `dataloader_persistent_workers`: False
416
+ - `skip_memory_metrics`: True
417
+ - `use_legacy_prediction_loop`: False
418
+ - `push_to_hub`: False
419
+ - `resume_from_checkpoint`: None
420
+ - `hub_model_id`: None
421
+ - `hub_strategy`: every_save
422
+ - `hub_private_repo`: False
423
+ - `hub_always_push`: False
424
+ - `gradient_checkpointing`: False
425
+ - `gradient_checkpointing_kwargs`: None
426
+ - `include_inputs_for_metrics`: False
427
+ - `eval_do_concat_batches`: True
428
+ - `fp16_backend`: auto
429
+ - `push_to_hub_model_id`: None
430
+ - `push_to_hub_organization`: None
431
+ - `mp_parameters`:
432
+ - `auto_find_batch_size`: False
433
+ - `full_determinism`: False
434
+ - `torchdynamo`: None
435
+ - `ray_scope`: last
436
+ - `ddp_timeout`: 1800
437
+ - `torch_compile`: False
438
+ - `torch_compile_backend`: None
439
+ - `torch_compile_mode`: None
440
+ - `dispatch_batches`: None
441
+ - `split_batches`: None
442
+ - `include_tokens_per_second`: False
443
+ - `include_num_input_tokens_seen`: False
444
+ - `neftune_noise_alpha`: None
445
+ - `optim_target_modules`: None
446
+ - `batch_eval_metrics`: False
447
+ - `batch_sampler`: no_duplicates
448
+ - `multi_dataset_batch_sampler`: proportional
449
+
450
+ </details>
451
+
452
+ ### Training Logs
453
+ | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy |
454
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
455
+ | 0 | 0 | - | - | 0.8830 |
456
+ | 0.0565 | 50 | 0.7484 | 0.2587 | 0.9873 |
457
+ | 0.1130 | 100 | 0.2822 | 0.2313 | 0.9898 |
458
+ | 0.1695 | 150 | 0.3023 | 0.2291 | 0.9873 |
459
+ | 0.2260 | 200 | 0.2484 | 0.2155 | 0.9873 |
460
+ | 0.2825 | 250 | 0.2909 | 0.1965 | 0.9847 |
461
+ | 0.3390 | 300 | 0.2999 | 0.2008 | 0.9847 |
462
+ | 0.3955 | 350 | 0.2586 | 0.1670 | 0.9924 |
463
+ | 0.4520 | 400 | 0.2385 | 0.1467 | 0.9898 |
464
+ | 0.5085 | 450 | 0.2353 | 0.1311 | 0.9898 |
465
+ | 0.5650 | 500 | 0.2632 | 0.1340 | 0.9873 |
466
+ | 0.6215 | 550 | 0.3793 | 0.1218 | 0.9898 |
467
+ | 0.6780 | 600 | 0.1978 | 0.1174 | 0.9898 |
468
+ | 0.7345 | 650 | 0.179 | 0.1254 | 0.9898 |
469
+ | 0.7910 | 700 | 0.1326 | 0.1142 | 0.9924 |
470
+ | 0.8475 | 750 | 0.1842 | 0.1153 | 0.9924 |
471
+
472
+
473
+ ### Framework Versions
474
+ - Python: 3.10.13
475
+ - Sentence Transformers: 3.0.1
476
+ - Transformers: 4.41.2
477
+ - PyTorch: 2.2.1
478
+ - Accelerate: 0.31.0
479
+ - Datasets: 2.20.0
480
+ - Tokenizers: 0.19.1
481
+
482
+ ## Citation
483
+
484
+ ### BibTeX
485
+
486
+ #### Sentence Transformers
487
+ ```bibtex
488
+ @inproceedings{reimers-2019-sentence-bert,
489
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
490
+ author = "Reimers, Nils and Gurevych, Iryna",
491
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
492
+ month = "11",
493
+ year = "2019",
494
+ publisher = "Association for Computational Linguistics",
495
+ url = "https://arxiv.org/abs/1908.10084",
496
+ }
497
+ ```
498
+
499
+ #### MultipleNegativesRankingLoss
500
+ ```bibtex
501
+ @misc{henderson2017efficient,
502
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
503
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
504
+ year={2017},
505
+ eprint={1705.00652},
506
+ archivePrefix={arXiv},
507
+ primaryClass={cs.CL}
508
+ }
509
+ ```
510
+
511
+ <!--
512
+ ## Glossary
513
+
514
+ *Clearly define terms in order to be accessible across audiences.*
515
+ -->
516
+
517
+ <!--
518
+ ## Model Card Authors
519
+
520
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
521
+ -->
522
+
523
+ <!--
524
+ ## Model Card Contact
525
+
526
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
527
+ -->
config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Alibaba-NLP/gte-large-en-v1.5",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
9
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
10
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
11
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
12
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
13
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
14
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
15
+ },
16
+ "classifier_dropout": null,
17
+ "hidden_act": "gelu",
18
+ "hidden_dropout_prob": 0.1,
19
+ "hidden_size": 1024,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 4096,
22
+ "layer_norm_eps": 1e-12,
23
+ "layer_norm_type": "layer_norm",
24
+ "logn_attention_clip1": false,
25
+ "logn_attention_scale": false,
26
+ "max_position_embeddings": 8192,
27
+ "model_type": "new",
28
+ "num_attention_heads": 16,
29
+ "num_hidden_layers": 24,
30
+ "pack_qkv": true,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "rope",
33
+ "rope_scaling": {
34
+ "factor": 2.0,
35
+ "type": "ntk"
36
+ },
37
+ "rope_theta": 160000,
38
+ "torch_dtype": "float32",
39
+ "transformers_version": "4.41.2",
40
+ "type_vocab_size": 2,
41
+ "unpad_inputs": false,
42
+ "use_memory_efficient_attention": false,
43
+ "vocab_size": 30528
44
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.2.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70375cf86ce215a2102feb4b304ed36991ea82875c75c28b88f81631f1520b43
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a430b8cd017dc17602afe4c768fe0a796a958cb2e98f341d0153a152c77d1beb
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 8000,
49
+ "model_max_length": 8192,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
trainer_state.json ADDED
@@ -0,0 +1,333 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.847457627118644,
5
+ "eval_steps": 50,
6
+ "global_step": 750,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.05649717514124294,
13
+ "grad_norm": 12.016854286193848,
14
+ "learning_rate": 1.0786516853932584e-05,
15
+ "loss": 0.7484,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.05649717514124294,
20
+ "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
21
+ "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
22
+ "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
23
+ "eval_all-nli-dev_manhattan_accuracy": 0.9821882951653944,
24
+ "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
25
+ "eval_loss": 0.25870630145072937,
26
+ "eval_runtime": 13.2055,
27
+ "eval_samples_per_second": 29.76,
28
+ "eval_steps_per_second": 3.786,
29
+ "step": 50
30
+ },
31
+ {
32
+ "epoch": 0.11299435028248588,
33
+ "grad_norm": 16.70142936706543,
34
+ "learning_rate": 1.977386934673367e-05,
35
+ "loss": 0.2822,
36
+ "step": 100
37
+ },
38
+ {
39
+ "epoch": 0.11299435028248588,
40
+ "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
41
+ "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
42
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
43
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
44
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
45
+ "eval_loss": 0.23130479454994202,
46
+ "eval_runtime": 13.1695,
47
+ "eval_samples_per_second": 29.842,
48
+ "eval_steps_per_second": 3.797,
49
+ "step": 100
50
+ },
51
+ {
52
+ "epoch": 0.1694915254237288,
53
+ "grad_norm": 5.665423393249512,
54
+ "learning_rate": 1.8542713567839195e-05,
55
+ "loss": 0.3023,
56
+ "step": 150
57
+ },
58
+ {
59
+ "epoch": 0.1694915254237288,
60
+ "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
61
+ "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
62
+ "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
63
+ "eval_all-nli-dev_manhattan_accuracy": 0.9872773536895675,
64
+ "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
65
+ "eval_loss": 0.22914756834506989,
66
+ "eval_runtime": 13.1614,
67
+ "eval_samples_per_second": 29.86,
68
+ "eval_steps_per_second": 3.799,
69
+ "step": 150
70
+ },
71
+ {
72
+ "epoch": 0.22598870056497175,
73
+ "grad_norm": 0.9494842886924744,
74
+ "learning_rate": 1.7311557788944723e-05,
75
+ "loss": 0.2484,
76
+ "step": 200
77
+ },
78
+ {
79
+ "epoch": 0.22598870056497175,
80
+ "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
81
+ "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
82
+ "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
83
+ "eval_all-nli-dev_manhattan_accuracy": 0.9847328244274809,
84
+ "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
85
+ "eval_loss": 0.21549050509929657,
86
+ "eval_runtime": 13.4649,
87
+ "eval_samples_per_second": 29.187,
88
+ "eval_steps_per_second": 3.713,
89
+ "step": 200
90
+ },
91
+ {
92
+ "epoch": 0.2824858757062147,
93
+ "grad_norm": 5.140994071960449,
94
+ "learning_rate": 1.6055276381909547e-05,
95
+ "loss": 0.2909,
96
+ "step": 250
97
+ },
98
+ {
99
+ "epoch": 0.2824858757062147,
100
+ "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
101
+ "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
102
+ "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
103
+ "eval_all-nli-dev_manhattan_accuracy": 0.9821882951653944,
104
+ "eval_all-nli-dev_max_accuracy": 0.9847328244274809,
105
+ "eval_loss": 0.19652578234672546,
106
+ "eval_runtime": 13.5616,
107
+ "eval_samples_per_second": 28.979,
108
+ "eval_steps_per_second": 3.687,
109
+ "step": 250
110
+ },
111
+ {
112
+ "epoch": 0.3389830508474576,
113
+ "grad_norm": 18.279523849487305,
114
+ "learning_rate": 1.4824120603015077e-05,
115
+ "loss": 0.2999,
116
+ "step": 300
117
+ },
118
+ {
119
+ "epoch": 0.3389830508474576,
120
+ "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
121
+ "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
122
+ "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
123
+ "eval_all-nli-dev_manhattan_accuracy": 0.9847328244274809,
124
+ "eval_all-nli-dev_max_accuracy": 0.9847328244274809,
125
+ "eval_loss": 0.20084014534950256,
126
+ "eval_runtime": 13.2009,
127
+ "eval_samples_per_second": 29.771,
128
+ "eval_steps_per_second": 3.788,
129
+ "step": 300
130
+ },
131
+ {
132
+ "epoch": 0.3954802259887006,
133
+ "grad_norm": 4.213184833526611,
134
+ "learning_rate": 1.3567839195979901e-05,
135
+ "loss": 0.2586,
136
+ "step": 350
137
+ },
138
+ {
139
+ "epoch": 0.3954802259887006,
140
+ "eval_all-nli-dev_cosine_accuracy": 0.9923664122137404,
141
+ "eval_all-nli-dev_dot_accuracy": 0.007633587786259542,
142
+ "eval_all-nli-dev_euclidean_accuracy": 0.9923664122137404,
143
+ "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
144
+ "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
145
+ "eval_loss": 0.16702787578105927,
146
+ "eval_runtime": 13.3509,
147
+ "eval_samples_per_second": 29.436,
148
+ "eval_steps_per_second": 3.745,
149
+ "step": 350
150
+ },
151
+ {
152
+ "epoch": 0.4519774011299435,
153
+ "grad_norm": 30.387929916381836,
154
+ "learning_rate": 1.2336683417085429e-05,
155
+ "loss": 0.2385,
156
+ "step": 400
157
+ },
158
+ {
159
+ "epoch": 0.4519774011299435,
160
+ "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
161
+ "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
162
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
163
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
164
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
165
+ "eval_loss": 0.14671088755130768,
166
+ "eval_runtime": 13.2819,
167
+ "eval_samples_per_second": 29.589,
168
+ "eval_steps_per_second": 3.765,
169
+ "step": 400
170
+ },
171
+ {
172
+ "epoch": 0.5084745762711864,
173
+ "grad_norm": 3.245051860809326,
174
+ "learning_rate": 1.1080402010050253e-05,
175
+ "loss": 0.2353,
176
+ "step": 450
177
+ },
178
+ {
179
+ "epoch": 0.5084745762711864,
180
+ "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
181
+ "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
182
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
183
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
184
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
185
+ "eval_loss": 0.13109469413757324,
186
+ "eval_runtime": 13.4569,
187
+ "eval_samples_per_second": 29.204,
188
+ "eval_steps_per_second": 3.716,
189
+ "step": 450
190
+ },
191
+ {
192
+ "epoch": 0.5649717514124294,
193
+ "grad_norm": 32.116214752197266,
194
+ "learning_rate": 9.824120603015075e-06,
195
+ "loss": 0.2632,
196
+ "step": 500
197
+ },
198
+ {
199
+ "epoch": 0.5649717514124294,
200
+ "eval_all-nli-dev_cosine_accuracy": 0.9847328244274809,
201
+ "eval_all-nli-dev_dot_accuracy": 0.015267175572519083,
202
+ "eval_all-nli-dev_euclidean_accuracy": 0.9847328244274809,
203
+ "eval_all-nli-dev_manhattan_accuracy": 0.9872773536895675,
204
+ "eval_all-nli-dev_max_accuracy": 0.9872773536895675,
205
+ "eval_loss": 0.13404284417629242,
206
+ "eval_runtime": 13.1995,
207
+ "eval_samples_per_second": 29.774,
208
+ "eval_steps_per_second": 3.788,
209
+ "step": 500
210
+ },
211
+ {
212
+ "epoch": 0.6214689265536724,
213
+ "grad_norm": 33.70884704589844,
214
+ "learning_rate": 8.5678391959799e-06,
215
+ "loss": 0.3793,
216
+ "step": 550
217
+ },
218
+ {
219
+ "epoch": 0.6214689265536724,
220
+ "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
221
+ "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
222
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
223
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
224
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
225
+ "eval_loss": 0.12181754410266876,
226
+ "eval_runtime": 13.194,
227
+ "eval_samples_per_second": 29.786,
228
+ "eval_steps_per_second": 3.79,
229
+ "step": 550
230
+ },
231
+ {
232
+ "epoch": 0.6779661016949152,
233
+ "grad_norm": 3.5105509757995605,
234
+ "learning_rate": 7.311557788944724e-06,
235
+ "loss": 0.1978,
236
+ "step": 600
237
+ },
238
+ {
239
+ "epoch": 0.6779661016949152,
240
+ "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
241
+ "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
242
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
243
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
244
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
245
+ "eval_loss": 0.11738275736570358,
246
+ "eval_runtime": 13.3865,
247
+ "eval_samples_per_second": 29.358,
248
+ "eval_steps_per_second": 3.735,
249
+ "step": 600
250
+ },
251
+ {
252
+ "epoch": 0.7344632768361582,
253
+ "grad_norm": 8.318052291870117,
254
+ "learning_rate": 6.055276381909548e-06,
255
+ "loss": 0.179,
256
+ "step": 650
257
+ },
258
+ {
259
+ "epoch": 0.7344632768361582,
260
+ "eval_all-nli-dev_cosine_accuracy": 0.9872773536895675,
261
+ "eval_all-nli-dev_dot_accuracy": 0.01272264631043257,
262
+ "eval_all-nli-dev_euclidean_accuracy": 0.9872773536895675,
263
+ "eval_all-nli-dev_manhattan_accuracy": 0.989821882951654,
264
+ "eval_all-nli-dev_max_accuracy": 0.989821882951654,
265
+ "eval_loss": 0.12535005807876587,
266
+ "eval_runtime": 13.465,
267
+ "eval_samples_per_second": 29.187,
268
+ "eval_steps_per_second": 3.713,
269
+ "step": 650
270
+ },
271
+ {
272
+ "epoch": 0.7909604519774012,
273
+ "grad_norm": 26.912717819213867,
274
+ "learning_rate": 4.7989949748743725e-06,
275
+ "loss": 0.1326,
276
+ "step": 700
277
+ },
278
+ {
279
+ "epoch": 0.7909604519774012,
280
+ "eval_all-nli-dev_cosine_accuracy": 0.989821882951654,
281
+ "eval_all-nli-dev_dot_accuracy": 0.010178117048346057,
282
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
283
+ "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
284
+ "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
285
+ "eval_loss": 0.11420778185129166,
286
+ "eval_runtime": 13.2565,
287
+ "eval_samples_per_second": 29.646,
288
+ "eval_steps_per_second": 3.772,
289
+ "step": 700
290
+ },
291
+ {
292
+ "epoch": 0.847457627118644,
293
+ "grad_norm": 12.022055625915527,
294
+ "learning_rate": 3.542713567839196e-06,
295
+ "loss": 0.1842,
296
+ "step": 750
297
+ },
298
+ {
299
+ "epoch": 0.847457627118644,
300
+ "eval_all-nli-dev_cosine_accuracy": 0.9923664122137404,
301
+ "eval_all-nli-dev_dot_accuracy": 0.007633587786259542,
302
+ "eval_all-nli-dev_euclidean_accuracy": 0.989821882951654,
303
+ "eval_all-nli-dev_manhattan_accuracy": 0.9923664122137404,
304
+ "eval_all-nli-dev_max_accuracy": 0.9923664122137404,
305
+ "eval_loss": 0.11530788987874985,
306
+ "eval_runtime": 13.2795,
307
+ "eval_samples_per_second": 29.595,
308
+ "eval_steps_per_second": 3.765,
309
+ "step": 750
310
+ }
311
+ ],
312
+ "logging_steps": 50,
313
+ "max_steps": 885,
314
+ "num_input_tokens_seen": 0,
315
+ "num_train_epochs": 1,
316
+ "save_steps": 50,
317
+ "stateful_callbacks": {
318
+ "TrainerControl": {
319
+ "args": {
320
+ "should_epoch_stop": false,
321
+ "should_evaluate": false,
322
+ "should_log": false,
323
+ "should_save": true,
324
+ "should_training_stop": false
325
+ },
326
+ "attributes": {}
327
+ }
328
+ },
329
+ "total_flos": 0.0,
330
+ "train_batch_size": 8,
331
+ "trial_name": null,
332
+ "trial_params": null
333
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29c3c2e1664d49d6e43d6889c04d8ce3a9b29b4c30f604c9032e7a794d32831d
3
+ size 5304
vocab.txt ADDED
The diff for this file is too large to render. See raw diff