GenAIGirl commited on
Commit
0558cb8
·
verified ·
1 Parent(s): 08ede16

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ pipeline_tag: sentence-similarity
7
+ tags:
8
+ - sentence-transformers
9
+ - sentence-similarity
10
+ - feature-extraction
11
+ - generated_from_trainer
12
+ - dataset_size:1342
13
+ - loss:MultipleNegativesRankingLoss
14
+ widget:
15
+ - source_sentence: What is significant about New Delhi's history?
16
+ sentences:
17
+ - As of the 2011 India census, Arackal had a population of 16,739 with 7,963 males
18
+ and 8,776 females.
19
+ - Edappadi K. Palaniswami is an Indian politician. He is the current and 8th Chief
20
+ Minister of Tamil Nadu. He is the chief minister since 16 February 2017. Palaniswami
21
+ is a senior leader of All India Anna Dravida Munnetra Kazhagam.
22
+ - New Delhi () is the capital of India and a union territory of the megacity of
23
+ Delhi. It has a very old history and is home to several monuments where the city
24
+ is expensive to live in. In traditional Indian geography it falls under the North
25
+ Indian zone. The city has an area of about 42.7 km. New Delhi has a population
26
+ of about 9.4 Million people.
27
+ - source_sentence: What was the significance of the Pandyan kingdom in ancient Tamil
28
+ history?
29
+ sentences:
30
+ - Ektara (literally "one-string", also called iktar, ', yaktaro gopichand) is a
31
+ one-string instrument. It is most often used in traditional music from Bangladesh,
32
+ India , Egypt, and Pakistan.
33
+ - Polygar War or Palayakarar Wars refers to the wars fought between the Polygars
34
+ ("Palayakarrars") of former Madurai Kingdom in Tamil Nadu, India and the British
35
+ colonial forces between March 1799 to May 1802. The British finally won after
36
+ carrying out long and difficult protracted jungle campaigns against the Polygar
37
+ armies and finally defeated them. Many lives were lost on both sides and the victory
38
+ over Polygars made large part of territories of Tamil Nadu coming under British
39
+ control enabling them to get a strong hold in India.
40
+ - The Pandyan kingdom பாண்டியர் was an ancient Tamil state in South India of unknown
41
+ antiquity. Pandyas were one of the three ancient Tamil kingdoms (Chola and Chera
42
+ being the other two) who ruled the Tamil country from pre-historic times until
43
+ end of the 15th century. They ruled initially from Korkai, a sea port on the southern
44
+ most tip of the Indian peninsula, and in later times moved to Madurai.
45
+ - source_sentence: Can you tell me about Louis-Frédéric Nussbaum's contributions?
46
+ sentences:
47
+ - Shipkila is a mountain pass and border post on the Republic of India-People's
48
+ Republic of China border. It is through this pass which the river Sutlej enters
49
+ India (from the Tibet Autonomous Region).
50
+ - Suvra Mukherjee (September 17, 1940 – August 18, 2015) was the First Lady of India
51
+ from 2012 until her death in 2015. She was the wife of Indian President Pranab
52
+ Mukherjee from 1957 until her death in 2015.
53
+ - Louis-Frédéric Nussbaum (1923-1996 ), also known as Louis Frédéric or Louis-Frédéric,
54
+ was a French scholar, art historian, writer, translator and editor. He was a specialist
55
+ in the cultures of Asia, especially India and Japan.
56
+ - source_sentence: What were the original goals of Dravida Kazhagam?
57
+ sentences:
58
+ - Sir Patrick Geddes (2 October 1854 – 17 April 1932) was a Scottish biologist,
59
+ sociologist, geographer, philanthropist and pioneering town planner. He developed
60
+ a new urban theoris, including the second master plan of Jerusalem in 1919. He
61
+ also developed the first master plan of Tel Aviv in 1925 that included the Bauhaus
62
+ architecture in the White City of Tel Aviv. His other work was in India during
63
+ the period of British India. A small memorial board to Patrick Geddes is under
64
+ a bridge of the Heil HaShirion Street in Tel Aviv.
65
+ - Dravida Kazhagam (or Dravidar Kazhagam, "Dravidian Organization") was one of the
66
+ first Dravidian parties in India. The party was founded by E.V. Ramasamy, also
67
+ called Thanthai Periyar. Its original goals were to eradicate the ills of the
68
+ existing caste system including untouchability and to obtain a "Dravida Nadu"
69
+ (Dravidian nation) from the Madras Presidency i.e., a separate nation from India
70
+ for Dravidian people alone.
71
+ - Dheeran Chinnamalai ( born as Theerthagiri Sarkkarai Mandraadiyaar [Sarkkarai
72
+ Mandraadiyaar Refers Payiran Kulam] or Theerthagiri Gounder on April 17, 1756)
73
+ was a Kongu chieftain and Palayakkarar from Tamil Nadu who rose up in revolt against
74
+ the British East India Company in the Kongu Nadu, Southern India. He was born
75
+ in Melapalayam, near Erode in the South Indian state of Tamil Nadu.
76
+ - source_sentence: Can you tell me about the literary contributions of Chattopadhyay?
77
+ sentences:
78
+ - The Election Commission of India held indirect 2nd presidential elections of India
79
+ on May 6, 1957. Dr. Rajendra Prasad won his re-election with 459,698 votes over
80
+ his rivals Chowdhry Hari Ramwho got 2,672 votes and Nagendra Narayan Das who got
81
+ 2,000 votes. Rajendra Prasad, has been the only person, to have won and served
82
+ two terms, as President of India.
83
+ - S. Rajendra Babu (born 1 June 1939) is an Indian judge. He was the 34th Chief
84
+ Justice of India from May to June 2004. He also served as the chairperson of National
85
+ Human Rights Commission of India.
86
+ - Rishi Bankim Chandra Chattopadhyay (27 June 1838 – 8 April 1894) was a Bengali
87
+ writer, poet and journalist. He was the composer of India's national song "Vande
88
+ Mataram". It was originally a Bengali and Sanskrit "stotra" (hymn) portraying
89
+ India as a mother goddess. The song inspired the activists during the Indian Independence
90
+ Movement. Chattopadhyay wrote 13 novels. He also wrote several 'serious, serio-comic,
91
+ satirical, scientific and critical articles in Bengali. His works were widely
92
+ translated into other regional languages of India.
93
+ ---
94
+
95
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
96
+
97
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
98
+
99
+ ## Model Details
100
+
101
+ ### Model Description
102
+ - **Model Type:** Sentence Transformer
103
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
104
+ - **Maximum Sequence Length:** 512 tokens
105
+ - **Output Dimensionality:** 768 tokens
106
+ - **Similarity Function:** Cosine Similarity
107
+ <!-- - **Training Dataset:** Unknown -->
108
+ <!-- - **Language:** Unknown -->
109
+ <!-- - **License:** Unknown -->
110
+
111
+ ### Model Sources
112
+
113
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
114
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
115
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
116
+
117
+ ### Full Model Architecture
118
+
119
+ ```
120
+ SentenceTransformer(
121
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
122
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
123
+ (2): Normalize()
124
+ )
125
+ ```
126
+
127
+ ## Usage
128
+
129
+ ### Direct Usage (Sentence Transformers)
130
+
131
+ First install the Sentence Transformers library:
132
+
133
+ ```bash
134
+ pip install -U sentence-transformers
135
+ ```
136
+
137
+ Then you can load this model and run inference.
138
+ ```python
139
+ from sentence_transformers import SentenceTransformer
140
+
141
+ # Download from the 🤗 Hub
142
+ model = SentenceTransformer("GenAIGirl/bge-base-finetune-embedder")
143
+ # Run inference
144
+ sentences = [
145
+ 'Can you tell me about the literary contributions of Chattopadhyay?',
146
+ 'Rishi Bankim Chandra Chattopadhyay (27 June 1838 – 8 April 1894) was a Bengali writer, poet and journalist. He was the composer of India\'s national song "Vande Mataram". It was originally a Bengali and Sanskrit "stotra" (hymn) portraying India as a mother goddess. The song inspired the activists during the Indian Independence Movement. Chattopadhyay wrote 13 novels. He also wrote several \'serious, serio-comic, satirical, scientific and critical articles in Bengali. His works were widely translated into other regional languages of India.',
147
+ 'S. Rajendra Babu (born 1 June 1939) is an Indian judge. He was the 34th Chief Justice of India from May to June 2004. He also served as the chairperson of National Human Rights Commission of India.',
148
+ ]
149
+ embeddings = model.encode(sentences)
150
+ print(embeddings.shape)
151
+ # [3, 768]
152
+
153
+ # Get the similarity scores for the embeddings
154
+ similarities = model.similarity(embeddings, embeddings)
155
+ print(similarities.shape)
156
+ # [3, 3]
157
+ ```
158
+
159
+ <!--
160
+ ### Direct Usage (Transformers)
161
+
162
+ <details><summary>Click to see the direct usage in Transformers</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Downstream Usage (Sentence Transformers)
169
+
170
+ You can finetune this model on your own dataset.
171
+
172
+ <details><summary>Click to expand</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Out-of-Scope Use
179
+
180
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
+ -->
182
+
183
+ <!--
184
+ ## Bias, Risks and Limitations
185
+
186
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
187
+ -->
188
+
189
+ <!--
190
+ ### Recommendations
191
+
192
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
193
+ -->
194
+
195
+ ## Training Details
196
+
197
+ ### Training Dataset
198
+
199
+ #### Unnamed Dataset
200
+
201
+
202
+ * Size: 1,342 training samples
203
+ * Columns: <code>question</code> and <code>context</code>
204
+ * Approximate statistics based on the first 1000 samples:
205
+ | | question | context |
206
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
207
+ | type | string | string |
208
+ | details | <ul><li>min: 6 tokens</li><li>mean: 12.49 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 83.95 tokens</li><li>max: 510 tokens</li></ul> |
209
+ * Samples:
210
+ | question | context |
211
+ |:--------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
212
+ | <code>What is the origin of Basil?</code> | <code>Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60 cm tall. It has light green, silky leaves 3–5 cm long and 1–3 cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.</code> |
213
+ | <code>In which cuisines is Basil prominently featured?</code> | <code>Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60 cm tall. It has light green, silky leaves 3–5 cm long and 1–3 cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.</code> |
214
+ | <code>What is the significance of the Roerich Pact?</code> | <code>The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten nations: Brazil, Chile, Colombia, Cuba, the Dominican Republic, El Salvador, Guatemala, Mexico, the United States, and Venezuela. It went into effect on 26 August 1935. The Government of India approved the Treaty in 1948, but did not take any further formal action. The Roerich Pact is also known as "Pax Cultura" ("Cultural Peace" or "Peace through Culture"). The most important part of the Roerich Pact is the legal recognition that the protection of culture is always more important than any military necessity.</code> |
215
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
216
+ ```json
217
+ {
218
+ "scale": 20.0,
219
+ "similarity_fct": "cos_sim"
220
+ }
221
+ ```
222
+
223
+ ### Evaluation Dataset
224
+
225
+ #### Unnamed Dataset
226
+
227
+
228
+ * Size: 100 evaluation samples
229
+ * Columns: <code>question</code> and <code>context</code>
230
+ * Approximate statistics based on the first 1000 samples:
231
+ | | question | context |
232
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
233
+ | type | string | string |
234
+ | details | <ul><li>min: 7 tokens</li><li>mean: 12.37 tokens</li><li>max: 18 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 72.93 tokens</li><li>max: 228 tokens</li></ul> |
235
+ * Samples:
236
+ | question | context |
237
+ |:------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
238
+ | <code>What role did Suvra Mukherjee hold in India?</code> | <code>Suvra Mukherjee (September 17, 1940 – August 18, 2015) was the First Lady of India from 2012 until her death in 2015. She was the wife of Indian President Pranab Mukherjee from 1957 until her death in 2015.</code> |
239
+ | <code>What political party is Edappadi K. Palaniswami associated with?</code> | <code>Edappadi K. Palaniswami is an Indian politician. He is the current and 8th Chief Minister of Tamil Nadu. He is the chief minister since 16 February 2017. Palaniswami is a senior leader of All India Anna Dravida Munnetra Kazhagam.</code> |
240
+ | <code>Where are Tibetan antelopes primarily found?</code> | <code>Tibetan antelope, also known as Chiru is a medium sized antelope most closely related to wild goats and sheep of the subfamily Caprinae. Tibetan antelope are native to northwest India and Tibet. They live on the treeless Steppe above . They are an endangered species. They are a target for hunters for their fine underfur called chiru. It is used to make luxury shawls. It takes about four animals to make a single shawl. In order to collect the chiru, the animals must be killed. Because of this the Chiru are close to extinction.</code> |
241
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
242
+ ```json
243
+ {
244
+ "scale": 20.0,
245
+ "similarity_fct": "cos_sim"
246
+ }
247
+ ```
248
+
249
+ ### Training Hyperparameters
250
+ #### Non-Default Hyperparameters
251
+
252
+ - `eval_strategy`: steps
253
+ - `per_device_train_batch_size`: 16
254
+ - `per_device_eval_batch_size`: 16
255
+ - `learning_rate`: 3e-06
256
+ - `max_steps`: 166
257
+ - `warmup_ratio`: 0.1
258
+ - `fp16`: True
259
+ - `batch_sampler`: no_duplicates
260
+
261
+ #### All Hyperparameters
262
+ <details><summary>Click to expand</summary>
263
+
264
+ - `overwrite_output_dir`: False
265
+ - `do_predict`: False
266
+ - `eval_strategy`: steps
267
+ - `prediction_loss_only`: True
268
+ - `per_device_train_batch_size`: 16
269
+ - `per_device_eval_batch_size`: 16
270
+ - `per_gpu_train_batch_size`: None
271
+ - `per_gpu_eval_batch_size`: None
272
+ - `gradient_accumulation_steps`: 1
273
+ - `eval_accumulation_steps`: None
274
+ - `torch_empty_cache_steps`: None
275
+ - `learning_rate`: 3e-06
276
+ - `weight_decay`: 0.0
277
+ - `adam_beta1`: 0.9
278
+ - `adam_beta2`: 0.999
279
+ - `adam_epsilon`: 1e-08
280
+ - `max_grad_norm`: 1.0
281
+ - `num_train_epochs`: 3.0
282
+ - `max_steps`: 166
283
+ - `lr_scheduler_type`: linear
284
+ - `lr_scheduler_kwargs`: {}
285
+ - `warmup_ratio`: 0.1
286
+ - `warmup_steps`: 0
287
+ - `log_level`: passive
288
+ - `log_level_replica`: warning
289
+ - `log_on_each_node`: True
290
+ - `logging_nan_inf_filter`: True
291
+ - `save_safetensors`: True
292
+ - `save_on_each_node`: False
293
+ - `save_only_model`: False
294
+ - `restore_callback_states_from_checkpoint`: False
295
+ - `no_cuda`: False
296
+ - `use_cpu`: False
297
+ - `use_mps_device`: False
298
+ - `seed`: 42
299
+ - `data_seed`: None
300
+ - `jit_mode_eval`: False
301
+ - `use_ipex`: False
302
+ - `bf16`: False
303
+ - `fp16`: True
304
+ - `fp16_opt_level`: O1
305
+ - `half_precision_backend`: auto
306
+ - `bf16_full_eval`: False
307
+ - `fp16_full_eval`: False
308
+ - `tf32`: None
309
+ - `local_rank`: 0
310
+ - `ddp_backend`: None
311
+ - `tpu_num_cores`: None
312
+ - `tpu_metrics_debug`: False
313
+ - `debug`: []
314
+ - `dataloader_drop_last`: False
315
+ - `dataloader_num_workers`: 0
316
+ - `dataloader_prefetch_factor`: None
317
+ - `past_index`: -1
318
+ - `disable_tqdm`: False
319
+ - `remove_unused_columns`: True
320
+ - `label_names`: None
321
+ - `load_best_model_at_end`: False
322
+ - `ignore_data_skip`: False
323
+ - `fsdp`: []
324
+ - `fsdp_min_num_params`: 0
325
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
326
+ - `fsdp_transformer_layer_cls_to_wrap`: None
327
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
328
+ - `deepspeed`: None
329
+ - `label_smoothing_factor`: 0.0
330
+ - `optim`: adamw_torch
331
+ - `optim_args`: None
332
+ - `adafactor`: False
333
+ - `group_by_length`: False
334
+ - `length_column_name`: length
335
+ - `ddp_find_unused_parameters`: None
336
+ - `ddp_bucket_cap_mb`: None
337
+ - `ddp_broadcast_buffers`: False
338
+ - `dataloader_pin_memory`: True
339
+ - `dataloader_persistent_workers`: False
340
+ - `skip_memory_metrics`: True
341
+ - `use_legacy_prediction_loop`: False
342
+ - `push_to_hub`: False
343
+ - `resume_from_checkpoint`: None
344
+ - `hub_model_id`: None
345
+ - `hub_strategy`: every_save
346
+ - `hub_private_repo`: False
347
+ - `hub_always_push`: False
348
+ - `gradient_checkpointing`: False
349
+ - `gradient_checkpointing_kwargs`: None
350
+ - `include_inputs_for_metrics`: False
351
+ - `eval_do_concat_batches`: True
352
+ - `fp16_backend`: auto
353
+ - `push_to_hub_model_id`: None
354
+ - `push_to_hub_organization`: None
355
+ - `mp_parameters`:
356
+ - `auto_find_batch_size`: False
357
+ - `full_determinism`: False
358
+ - `torchdynamo`: None
359
+ - `ray_scope`: last
360
+ - `ddp_timeout`: 1800
361
+ - `torch_compile`: False
362
+ - `torch_compile_backend`: None
363
+ - `torch_compile_mode`: None
364
+ - `dispatch_batches`: None
365
+ - `split_batches`: None
366
+ - `include_tokens_per_second`: False
367
+ - `include_num_input_tokens_seen`: False
368
+ - `neftune_noise_alpha`: None
369
+ - `optim_target_modules`: None
370
+ - `batch_eval_metrics`: False
371
+ - `eval_on_start`: False
372
+ - `use_liger_kernel`: False
373
+ - `eval_use_gather_object`: False
374
+ - `batch_sampler`: no_duplicates
375
+ - `multi_dataset_batch_sampler`: proportional
376
+
377
+ </details>
378
+
379
+ ### Training Logs
380
+ | Epoch | Step | Training Loss | loss |
381
+ |:------:|:----:|:-------------:|:------:|
382
+ | 0.2381 | 20 | 0.1734 | 0.0589 |
383
+ | 0.4762 | 40 | 0.0827 | 0.0477 |
384
+ | 0.7143 | 60 | 0.0737 | 0.0474 |
385
+ | 0.9524 | 80 | 0.0451 | 0.0465 |
386
+ | 1.1905 | 100 | 0.0569 | 0.0416 |
387
+ | 1.4286 | 120 | 0.0431 | 0.0407 |
388
+ | 1.6667 | 140 | 0.03 | 0.0406 |
389
+ | 1.9048 | 160 | 0.0389 | 0.0405 |
390
+
391
+
392
+ ### Framework Versions
393
+ - Python: 3.10.12
394
+ - Sentence Transformers: 3.0.1
395
+ - Transformers: 4.45.1
396
+ - PyTorch: 2.2.0+cu121
397
+ - Accelerate: 0.34.2
398
+ - Datasets: 2.20.0
399
+ - Tokenizers: 0.20.0
400
+
401
+ ## Citation
402
+
403
+ ### BibTeX
404
+
405
+ #### Sentence Transformers
406
+ ```bibtex
407
+ @inproceedings{reimers-2019-sentence-bert,
408
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
409
+ author = "Reimers, Nils and Gurevych, Iryna",
410
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
411
+ month = "11",
412
+ year = "2019",
413
+ publisher = "Association for Computational Linguistics",
414
+ url = "https://arxiv.org/abs/1908.10084",
415
+ }
416
+ ```
417
+
418
+ #### MultipleNegativesRankingLoss
419
+ ```bibtex
420
+ @misc{henderson2017efficient,
421
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
422
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
423
+ year={2017},
424
+ eprint={1705.00652},
425
+ archivePrefix={arXiv},
426
+ primaryClass={cs.CL}
427
+ }
428
+ ```
429
+
430
+ <!--
431
+ ## Glossary
432
+
433
+ *Clearly define terms in order to be accessible across audiences.*
434
+ -->
435
+
436
+ <!--
437
+ ## Model Card Authors
438
+
439
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
440
+ -->
441
+
442
+ <!--
443
+ ## Model Card Contact
444
+
445
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
446
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.45.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78900bd07ff0ff0e69b2677f46b5250f13715739b1985e2f98e2c743e5a2fee2
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff