ric9176 commited on
Commit
c846e6d
·
verified ·
1 Parent(s): 134f46f

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,591 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:154
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: Who will be introducing the first and second Joker movies at the
13
+ festival?
14
+ sentences:
15
+ - '13 Apr 2025Photo: Marshmallow Laser FeastSoil – it’s not something you really
16
+ think about, unless you’re doing the gardening. But this new exhibition at Somerset
17
+ House will change all that, shining a light on its important role in our world,
18
+ including the part it plays in our planet’s future. Top artists, writers and scientists
19
+ from across the globe are all involved in the thought-provoking exploration, which
20
+ aims to stop you thinking of soil as mere dirt and start considering it as something
21
+ far more powerful instead.Read moreBuy ticket24. Enjoy stunning views of the River
22
+ Thames with three courses at Sea ContainersNiall Clutton'
23
+ - favourite movies – the soundtracks. London Soundtrack Festival puts the scores
24
+ front and centre in March 2025, with a series of screenings, talks and performances
25
+ celebrating the musicians who make Hollywood sound so exciting, tense and emotional.
26
+ Highlights include Hildur Guðnadóttir introducing the first and second Joker movies
27
+ and, later in the programme, holding her own concert, David Cronenberg and Howard
28
+ Shore in conversation, screenings of Charlie Chaplin’s Modern Times, The Silence
29
+ of the Lambs and Eighth Grade with live scores, a day-long celebration of video
30
+ game music at The Roundhouse ‘Great Movie Songs with Anne Dudley & Friends’ featuring
31
+ guest appearances from the likes of the Pet Shop Boys’ Neil Tennant and Jake Shears
32
+ of
33
+ - Peter Walker Sculptor and David Harper ComposerSt Paul’s is about to get lit. In
34
+ February, the cathedral will be transformed via a stunning immersive light and
35
+ sound show. ‘Luminous’ by art collective Luxmuralis will animate the interior
36
+ of the building with illuminations and soundscapes inspired by its history, collections
37
+ and archives. Previously, Luxmuralis has created shows at Westminster Abbey, Durham
38
+ Cathedral and Oxford University. The company was also behind the ‘Poppy Fields’
39
+ display at the Tower of London in October.
40
+ - source_sentence: What is the significance of Haddadi in the given context?
41
+ sentences:
42
+ - It’s been almost a decade since Red Bull Culture Clash last took place in London,
43
+ but finally, it’s making its return in 2025, The epic music battle, inspired by
44
+ Jamaican sound clash culture, will see four crews armed with their finest dubplates
45
+ go head-to-head, delivering the best of the electronic, UK rap, Afro, and Caribbean
46
+ music scenes. Only one can be crowned the winner, though, and take home the Red
47
+ Bull Culture Clash trophy, with the victor. The likes of Boy Better Know, A$AP
48
+ Mob and Rebel Sound have previously competed at the legendary competition, as
49
+ well as special guests like J Hus, Stormzy, and Ice Kid, so crowds can expect
50
+ some pretty special things from its return, which takes place at Drumsheds in
51
+ March. Read moreBuy
52
+ - Haddadi
53
+ - The Irish really know how to celebrate, so when it comes to St Patrick’s Day in
54
+ London, the city’s Irish community has no problem showing us how it’s done. A
55
+ day to celebrate the patron saint of Ireland, the occasion is always one big welcoming
56
+ bash. Expect lots of dancing, hearty traditional dishes, a huge parade and as
57
+ many pints as you can handle. The Mayor of London’s annual St Patrick’s Day Festival
58
+ celebration will take place on Sunday March 16 – a day ahead of the official holiday
59
+ – and, as usual, thousands of revellers are expected to watch the parade wend
60
+ its way through central London, while there’ll also be plenty more St Patrick’s
61
+ Day parties and events to check out around the city. We’ll be rounding up the
62
+ best of them for you
63
+ - source_sentence: How does Renée Zellweger's portrayal of Bridget Jones evolve in
64
+ "Mad About the Boy" compared to her earlier performances?
65
+ sentences:
66
+ - "From St Paddy’s to Mothering Sunday, Pancake Day to International Women’s Day, the\
67
+ \ third month of the year packs in a whole host of big celebrations. \nAnd it’s\
68
+ \ also an especially great month for culture vultures. There are a host of film\
69
+ \ festivals happening around the city, from BFI Flare and the inaugural London\
70
+ \ Soundtrack Festival to Kinoteka, Cinema Made in Italy and the Banff Mountain\
71
+ \ Film Festival. \nAnd there’s also Deptford Literature Festival, the Young Barbican\
72
+ \ Takeover Festival, music conference series AVA London and the Other Art Fair. \n\
73
+ Find out about all of these, and much more, in our roundup of the best things\
74
+ \ to do in London over the month."
75
+ - ��Fourquels’ are usually where film franchises start to flirt with rock bottom,
76
+ so it’s a joy to report that Mad About the Boy is comfortably the best Bridget
77
+ Jones outing since Bridget Jones’s Diary. For Renée Zellweger’s still klutzy but
78
+ now wiser Bridge, living in cosy Hampstead, the singleton Borough era is a distant
79
+ memory. Ciggies and Chardonnay have been dispensed with replaced with a big dose
80
+ of lingering grief for lawyer Mark Darcy (Colin Firth). It says everything for
81
+ the script (co-written by Helen Fielding, Dan Mazer and Abi Morgan) that even
82
+ Daniel Cleaver, now entering his own Jurassic era and a bit sad about it, gets
83
+ an affecting arc here. The plot will surprise no one, but it barely matters –
84
+ this is Bridget’s journey of
85
+ - The Six Nations rugby tournament is back for 2025, taking over boozers, beer gardens
86
+ and outdoor screens across London most weekends up until Saturday March 15. And
87
+ you could just watch on your telly at home. But as the annual competition reaches
88
+ its final stages, you might  prefer to catch every scrimmage, try and conversion
89
+ in a lively atmosphere with a nice freshly-poured Guinness in hand. So head to
90
+ one of the rugby pubs, bars, beer halls, markets and social clubs listed here,
91
+ where you’ll find free-flowing pints, special guest appearances and countless
92
+ renditions of ‘Swing Low, Sweet Chariot’.Read moreAdvertising11. Celebrate the
93
+ matriarchs in your life on Mother’s Day in LondonThings to doMums deserve high
94
+ praise all year round,
95
+ - source_sentence: Who is mentioned in relation to getting Guinnesses for the event?
96
+ sentences:
97
+ - 'you agree to our Terms of Use and Privacy Policy and consent to receive emails
98
+ from Time Out about news, events, offers and partner promotions.SubscribeSearchNewsThings
99
+ to DoFood & DrinkArtTheatreTravelHalf-TermOffersSeparatorKidsAttractionsMuseumsFilmMusicNightlifeHotelsLondonLondonNew
100
+ YorkParisChicagoLos AngelesLisbonHong KongSydneyMelbournePortoSingaporeBarcelonaMadridMontréalBostonMiamiWorldwideCloseNewsThings
101
+ to DoFood & DrinkArtTheatreTravelHalf-TermOffersMoreKidsAttractionsMuseumsFilmMusicNightlifeHotelsLondonLondonNew
102
+ YorkParisChicagoLos AngelesLisbonHong KongSydneyMelbournePortoSingaporeBarcelonaMadridMontréalBostonMiamiWorldwideSubscribeOffers
103
+ EnglishEnglishEspañolinstagramtiktokfacebooktwitteryoutubePhotograph: Steve Beech
104
+ /'
105
+ - Haddadi
106
+ - 'Shields returning.Read moreBuy ticket2. Get the Guinnesses in for St Patrick’s
107
+ Day in LondonThings to doPhotograph: Sandor Szmutko'
108
+ - source_sentence: What platforms are mentioned in the context for social media engagement?
109
+ sentences:
110
+ - out for your first newsletter in your inbox soon!instagramtiktokfacebooktwitteryoutubeAbout
111
+ usPress officeInvestor relationsOur awardsWork for Time OutEditorial guidelinesPrivacy
112
+ noticeDo not sell my informationCookie policyAccessibility statementTerms of useModern
113
+ slavery statementManage cookiesContact usGet ListedClaim your listingTime Out
114
+ Offers FAQAdvertisingTime Out MarketTime Out productsTime Out OffersTime Out WorldwideMoviesRestaurantsSite
115
+ Map© 2025 Time Out England Limited and affiliated companies owned by Time Out
116
+ Group Plc. All rights reserved. Time Out is a registered trademark of Time Out
117
+ Digital Limited.
118
+ - 'You’ve probably heard all about Versailles’ dazzling Hall of Mirrors and its
119
+ gorgeous, well-manicured gardens – maybe you’ve even seen them IRL. But do you
120
+ know about the role the French royal court played in not just spreading scientific
121
+ knowledge, but making it fashionable, too? The Science Museum’s latest exhibition,
122
+ ‘Versailles: Science And Splendour’, will uncover that lesser-talked-about side
123
+ of the palace’s history, diving into the royal family’s relationship with science,
124
+ women’s impact on medicine, philosophy and botany at the royal court, and showcasing
125
+ more than 100 items that reinforce those stories – many of which have never been
126
+ displayed in the UK before.'
127
+ - 'Steve Beech / ShutterstockPhotograph: Steve Beech / ShutterstockLondon events
128
+ in March 2025Our guide to the best events, festivals, workshops, exhibitions and
129
+ things to do throughout March 2025 in LondonWednesday 12 February 2025ShareCopy
130
+ LinkFacebookTwitterPinterestEmailWhatsAppWritten by Rosie HewitsonThings to Do
131
+ Editor, LondonAdvertisingThe days are getting gradually lighter, the snowdrops
132
+ and crocuses have arrived in London’s park, and London’s cultural scene has burst
133
+ into life after a mid-winter lull. It can only mean one thing; March is right
134
+ around the corner.'
135
+ pipeline_tag: sentence-similarity
136
+ library_name: sentence-transformers
137
+ metrics:
138
+ - cosine_accuracy@1
139
+ - cosine_accuracy@3
140
+ - cosine_accuracy@5
141
+ - cosine_accuracy@10
142
+ - cosine_precision@1
143
+ - cosine_precision@3
144
+ - cosine_precision@5
145
+ - cosine_precision@10
146
+ - cosine_recall@1
147
+ - cosine_recall@3
148
+ - cosine_recall@5
149
+ - cosine_recall@10
150
+ - cosine_ndcg@10
151
+ - cosine_mrr@10
152
+ - cosine_map@100
153
+ model-index:
154
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
155
+ results:
156
+ - task:
157
+ type: information-retrieval
158
+ name: Information Retrieval
159
+ dataset:
160
+ name: Unknown
161
+ type: unknown
162
+ metrics:
163
+ - type: cosine_accuracy@1
164
+ value: 0.8846153846153846
165
+ name: Cosine Accuracy@1
166
+ - type: cosine_accuracy@3
167
+ value: 1.0
168
+ name: Cosine Accuracy@3
169
+ - type: cosine_accuracy@5
170
+ value: 1.0
171
+ name: Cosine Accuracy@5
172
+ - type: cosine_accuracy@10
173
+ value: 1.0
174
+ name: Cosine Accuracy@10
175
+ - type: cosine_precision@1
176
+ value: 0.8846153846153846
177
+ name: Cosine Precision@1
178
+ - type: cosine_precision@3
179
+ value: 0.33333333333333337
180
+ name: Cosine Precision@3
181
+ - type: cosine_precision@5
182
+ value: 0.20000000000000004
183
+ name: Cosine Precision@5
184
+ - type: cosine_precision@10
185
+ value: 0.10000000000000002
186
+ name: Cosine Precision@10
187
+ - type: cosine_recall@1
188
+ value: 0.8846153846153846
189
+ name: Cosine Recall@1
190
+ - type: cosine_recall@3
191
+ value: 1.0
192
+ name: Cosine Recall@3
193
+ - type: cosine_recall@5
194
+ value: 1.0
195
+ name: Cosine Recall@5
196
+ - type: cosine_recall@10
197
+ value: 1.0
198
+ name: Cosine Recall@10
199
+ - type: cosine_ndcg@10
200
+ value: 0.9574149715659375
201
+ name: Cosine Ndcg@10
202
+ - type: cosine_mrr@10
203
+ value: 0.9423076923076923
204
+ name: Cosine Mrr@10
205
+ - type: cosine_map@100
206
+ value: 0.9423076923076923
207
+ name: Cosine Map@100
208
+ ---
209
+
210
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
211
+
212
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
213
+
214
+ ## Model Details
215
+
216
+ ### Model Description
217
+ - **Model Type:** Sentence Transformer
218
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
219
+ - **Maximum Sequence Length:** 512 tokens
220
+ - **Output Dimensionality:** 1024 dimensions
221
+ - **Similarity Function:** Cosine Similarity
222
+ <!-- - **Training Dataset:** Unknown -->
223
+ <!-- - **Language:** Unknown -->
224
+ <!-- - **License:** Unknown -->
225
+
226
+ ### Model Sources
227
+
228
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
229
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
230
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
231
+
232
+ ### Full Model Architecture
233
+
234
+ ```
235
+ SentenceTransformer(
236
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
237
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
238
+ (2): Normalize()
239
+ )
240
+ ```
241
+
242
+ ## Usage
243
+
244
+ ### Direct Usage (Sentence Transformers)
245
+
246
+ First install the Sentence Transformers library:
247
+
248
+ ```bash
249
+ pip install -U sentence-transformers
250
+ ```
251
+
252
+ Then you can load this model and run inference.
253
+ ```python
254
+ from sentence_transformers import SentenceTransformer
255
+
256
+ # Download from the 🤗 Hub
257
+ model = SentenceTransformer("ric9176/cjo-ft-v0")
258
+ # Run inference
259
+ sentences = [
260
+ 'What platforms are mentioned in the context for social media engagement?',
261
+ 'out for your first newsletter in your inbox soon!instagramtiktokfacebooktwitteryoutubeAbout usPress officeInvestor relationsOur awardsWork for Time OutEditorial guidelinesPrivacy noticeDo not sell my informationCookie policyAccessibility statementTerms of useModern slavery statementManage cookiesContact usGet ListedClaim your listingTime Out Offers FAQAdvertisingTime Out MarketTime Out productsTime Out OffersTime Out WorldwideMoviesRestaurantsSite Map© 2025 Time Out England Limited and affiliated companies owned by Time Out Group Plc. All rights reserved. Time Out is a registered trademark of Time Out Digital Limited.',
262
+ 'Steve Beech / ShutterstockPhotograph: Steve Beech / ShutterstockLondon events in March 2025Our guide to the best events, festivals, workshops, exhibitions and things to do throughout March 2025 in LondonWednesday 12 February 2025ShareCopy LinkFacebookTwitterPinterestEmailWhatsAppWritten by Rosie HewitsonThings to Do Editor, LondonAdvertisingThe days are getting gradually lighter, the snowdrops and crocuses have arrived in London’s park, and London’s cultural scene has burst into life after a mid-winter lull. It can only mean one thing; March is right around the corner.',
263
+ ]
264
+ embeddings = model.encode(sentences)
265
+ print(embeddings.shape)
266
+ # [3, 1024]
267
+
268
+ # Get the similarity scores for the embeddings
269
+ similarities = model.similarity(embeddings, embeddings)
270
+ print(similarities.shape)
271
+ # [3, 3]
272
+ ```
273
+
274
+ <!--
275
+ ### Direct Usage (Transformers)
276
+
277
+ <details><summary>Click to see the direct usage in Transformers</summary>
278
+
279
+ </details>
280
+ -->
281
+
282
+ <!--
283
+ ### Downstream Usage (Sentence Transformers)
284
+
285
+ You can finetune this model on your own dataset.
286
+
287
+ <details><summary>Click to expand</summary>
288
+
289
+ </details>
290
+ -->
291
+
292
+ <!--
293
+ ### Out-of-Scope Use
294
+
295
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
296
+ -->
297
+
298
+ ## Evaluation
299
+
300
+ ### Metrics
301
+
302
+ #### Information Retrieval
303
+
304
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
305
+
306
+ | Metric | Value |
307
+ |:--------------------|:-----------|
308
+ | cosine_accuracy@1 | 0.8846 |
309
+ | cosine_accuracy@3 | 1.0 |
310
+ | cosine_accuracy@5 | 1.0 |
311
+ | cosine_accuracy@10 | 1.0 |
312
+ | cosine_precision@1 | 0.8846 |
313
+ | cosine_precision@3 | 0.3333 |
314
+ | cosine_precision@5 | 0.2 |
315
+ | cosine_precision@10 | 0.1 |
316
+ | cosine_recall@1 | 0.8846 |
317
+ | cosine_recall@3 | 1.0 |
318
+ | cosine_recall@5 | 1.0 |
319
+ | cosine_recall@10 | 1.0 |
320
+ | **cosine_ndcg@10** | **0.9574** |
321
+ | cosine_mrr@10 | 0.9423 |
322
+ | cosine_map@100 | 0.9423 |
323
+
324
+ <!--
325
+ ## Bias, Risks and Limitations
326
+
327
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
328
+ -->
329
+
330
+ <!--
331
+ ### Recommendations
332
+
333
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
334
+ -->
335
+
336
+ ## Training Details
337
+
338
+ ### Training Dataset
339
+
340
+ #### Unnamed Dataset
341
+
342
+ * Size: 154 training samples
343
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
344
+ * Approximate statistics based on the first 154 samples:
345
+ | | sentence_0 | sentence_1 |
346
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
347
+ | type | string | string |
348
+ | details | <ul><li>min: 8 tokens</li><li>mean: 18.04 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 129.57 tokens</li><li>max: 226 tokens</li></ul> |
349
+ * Samples:
350
+ | sentence_0 | sentence_1 |
351
+ |:-----------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
352
+ | <code>What types of events and activities are highlighted for the weekend in London?</code> | <code>30 Wonderful Things To Do This Weekend in London – weekend events and activities in LondonGo to the contentGo to the footerNo thanksSubscribe🙌Awesome, you're subscribed!Thanks for subscribing! Look out for your first newsletter in your inbox soon!Get us in your inboxSign up to our newsletter for the latest and greatest from your city and beyondEnter email addressDéjà vu! We already have this email. Try another?By entering your email address you agree to our Terms of Use and Privacy Policy and consent to receive emails from Time Out about news, events, offers and partner promotions.No thanks Awesome, you're subscribed!Thanks for subscribing! Look out for your first newsletter in your inbox soon!The best of London for free.Sign up for</code> |
353
+ | <code>How can individuals stay updated on the latest happenings in London according to the context?</code> | <code>30 Wonderful Things To Do This Weekend in London – weekend events and activities in LondonGo to the contentGo to the footerNo thanksSubscribe🙌Awesome, you're subscribed!Thanks for subscribing! Look out for your first newsletter in your inbox soon!Get us in your inboxSign up to our newsletter for the latest and greatest from your city and beyondEnter email addressDéjà vu! We already have this email. Try another?By entering your email address you agree to our Terms of Use and Privacy Policy and consent to receive emails from Time Out about news, events, offers and partner promotions.No thanks Awesome, you're subscribed!Thanks for subscribing! Look out for your first newsletter in your inbox soon!The best of London for free.Sign up for</code> |
354
+ | <code>What benefits do subscribers receive by signing up for the email newsletter?</code> | <code>free.Sign up for our email to enjoy London without spending a thing (as well as some options when you’re feeling flush).Enter email addressDéjà vu! We already have this email. Try another?No thanksBy entering your email address you agree to our Terms of Use and Privacy Policy and consent to receive emails from Time Out about news, events, offers and partner promotions.No thanks Awesome, you're subscribed!Thanks for subscribing! Look out for your first newsletter in your inbox soon!Love the mag?Our newsletter hand-delivers the best bits to your inbox. Sign up to unlock our digital magazines and also receive the latest news, events, offers and partner promotions.Enter email addressDéjà vu! We already have this email. Try another?No</code> |
355
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
356
+ ```json
357
+ {
358
+ "loss": "MultipleNegativesRankingLoss",
359
+ "matryoshka_dims": [
360
+ 768,
361
+ 512,
362
+ 256,
363
+ 128,
364
+ 64
365
+ ],
366
+ "matryoshka_weights": [
367
+ 1,
368
+ 1,
369
+ 1,
370
+ 1,
371
+ 1
372
+ ],
373
+ "n_dims_per_step": -1
374
+ }
375
+ ```
376
+
377
+ ### Training Hyperparameters
378
+ #### Non-Default Hyperparameters
379
+
380
+ - `eval_strategy`: steps
381
+ - `per_device_train_batch_size`: 10
382
+ - `per_device_eval_batch_size`: 10
383
+ - `num_train_epochs`: 10
384
+ - `multi_dataset_batch_sampler`: round_robin
385
+
386
+ #### All Hyperparameters
387
+ <details><summary>Click to expand</summary>
388
+
389
+ - `overwrite_output_dir`: False
390
+ - `do_predict`: False
391
+ - `eval_strategy`: steps
392
+ - `prediction_loss_only`: True
393
+ - `per_device_train_batch_size`: 10
394
+ - `per_device_eval_batch_size`: 10
395
+ - `per_gpu_train_batch_size`: None
396
+ - `per_gpu_eval_batch_size`: None
397
+ - `gradient_accumulation_steps`: 1
398
+ - `eval_accumulation_steps`: None
399
+ - `torch_empty_cache_steps`: None
400
+ - `learning_rate`: 5e-05
401
+ - `weight_decay`: 0.0
402
+ - `adam_beta1`: 0.9
403
+ - `adam_beta2`: 0.999
404
+ - `adam_epsilon`: 1e-08
405
+ - `max_grad_norm`: 1
406
+ - `num_train_epochs`: 10
407
+ - `max_steps`: -1
408
+ - `lr_scheduler_type`: linear
409
+ - `lr_scheduler_kwargs`: {}
410
+ - `warmup_ratio`: 0.0
411
+ - `warmup_steps`: 0
412
+ - `log_level`: passive
413
+ - `log_level_replica`: warning
414
+ - `log_on_each_node`: True
415
+ - `logging_nan_inf_filter`: True
416
+ - `save_safetensors`: True
417
+ - `save_on_each_node`: False
418
+ - `save_only_model`: False
419
+ - `restore_callback_states_from_checkpoint`: False
420
+ - `no_cuda`: False
421
+ - `use_cpu`: False
422
+ - `use_mps_device`: False
423
+ - `seed`: 42
424
+ - `data_seed`: None
425
+ - `jit_mode_eval`: False
426
+ - `use_ipex`: False
427
+ - `bf16`: False
428
+ - `fp16`: False
429
+ - `fp16_opt_level`: O1
430
+ - `half_precision_backend`: auto
431
+ - `bf16_full_eval`: False
432
+ - `fp16_full_eval`: False
433
+ - `tf32`: None
434
+ - `local_rank`: 0
435
+ - `ddp_backend`: None
436
+ - `tpu_num_cores`: None
437
+ - `tpu_metrics_debug`: False
438
+ - `debug`: []
439
+ - `dataloader_drop_last`: False
440
+ - `dataloader_num_workers`: 0
441
+ - `dataloader_prefetch_factor`: None
442
+ - `past_index`: -1
443
+ - `disable_tqdm`: False
444
+ - `remove_unused_columns`: True
445
+ - `label_names`: None
446
+ - `load_best_model_at_end`: False
447
+ - `ignore_data_skip`: False
448
+ - `fsdp`: []
449
+ - `fsdp_min_num_params`: 0
450
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
451
+ - `fsdp_transformer_layer_cls_to_wrap`: None
452
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
453
+ - `deepspeed`: None
454
+ - `label_smoothing_factor`: 0.0
455
+ - `optim`: adamw_torch
456
+ - `optim_args`: None
457
+ - `adafactor`: False
458
+ - `group_by_length`: False
459
+ - `length_column_name`: length
460
+ - `ddp_find_unused_parameters`: None
461
+ - `ddp_bucket_cap_mb`: None
462
+ - `ddp_broadcast_buffers`: False
463
+ - `dataloader_pin_memory`: True
464
+ - `dataloader_persistent_workers`: False
465
+ - `skip_memory_metrics`: True
466
+ - `use_legacy_prediction_loop`: False
467
+ - `push_to_hub`: False
468
+ - `resume_from_checkpoint`: None
469
+ - `hub_model_id`: None
470
+ - `hub_strategy`: every_save
471
+ - `hub_private_repo`: None
472
+ - `hub_always_push`: False
473
+ - `gradient_checkpointing`: False
474
+ - `gradient_checkpointing_kwargs`: None
475
+ - `include_inputs_for_metrics`: False
476
+ - `include_for_metrics`: []
477
+ - `eval_do_concat_batches`: True
478
+ - `fp16_backend`: auto
479
+ - `push_to_hub_model_id`: None
480
+ - `push_to_hub_organization`: None
481
+ - `mp_parameters`:
482
+ - `auto_find_batch_size`: False
483
+ - `full_determinism`: False
484
+ - `torchdynamo`: None
485
+ - `ray_scope`: last
486
+ - `ddp_timeout`: 1800
487
+ - `torch_compile`: False
488
+ - `torch_compile_backend`: None
489
+ - `torch_compile_mode`: None
490
+ - `dispatch_batches`: None
491
+ - `split_batches`: None
492
+ - `include_tokens_per_second`: False
493
+ - `include_num_input_tokens_seen`: False
494
+ - `neftune_noise_alpha`: None
495
+ - `optim_target_modules`: None
496
+ - `batch_eval_metrics`: False
497
+ - `eval_on_start`: False
498
+ - `use_liger_kernel`: False
499
+ - `eval_use_gather_object`: False
500
+ - `average_tokens_across_devices`: False
501
+ - `prompts`: None
502
+ - `batch_sampler`: batch_sampler
503
+ - `multi_dataset_batch_sampler`: round_robin
504
+
505
+ </details>
506
+
507
+ ### Training Logs
508
+ | Epoch | Step | cosine_ndcg@10 |
509
+ |:-----:|:----:|:--------------:|
510
+ | 1.0 | 16 | 0.9213 |
511
+ | 2.0 | 32 | 0.9355 |
512
+ | 3.0 | 48 | 0.9290 |
513
+ | 3.125 | 50 | 0.9432 |
514
+ | 4.0 | 64 | 0.9574 |
515
+ | 5.0 | 80 | 0.9574 |
516
+ | 6.0 | 96 | 0.9574 |
517
+ | 6.25 | 100 | 0.9574 |
518
+ | 7.0 | 112 | 0.9574 |
519
+ | 8.0 | 128 | 0.9574 |
520
+ | 9.0 | 144 | 0.9574 |
521
+ | 9.375 | 150 | 0.9574 |
522
+ | 10.0 | 160 | 0.9574 |
523
+
524
+
525
+ ### Framework Versions
526
+ - Python: 3.11.11
527
+ - Sentence Transformers: 3.4.1
528
+ - Transformers: 4.48.3
529
+ - PyTorch: 2.5.1+cu124
530
+ - Accelerate: 1.3.0
531
+ - Datasets: 3.3.2
532
+ - Tokenizers: 0.21.0
533
+
534
+ ## Citation
535
+
536
+ ### BibTeX
537
+
538
+ #### Sentence Transformers
539
+ ```bibtex
540
+ @inproceedings{reimers-2019-sentence-bert,
541
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
542
+ author = "Reimers, Nils and Gurevych, Iryna",
543
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
544
+ month = "11",
545
+ year = "2019",
546
+ publisher = "Association for Computational Linguistics",
547
+ url = "https://arxiv.org/abs/1908.10084",
548
+ }
549
+ ```
550
+
551
+ #### MatryoshkaLoss
552
+ ```bibtex
553
+ @misc{kusupati2024matryoshka,
554
+ title={Matryoshka Representation Learning},
555
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
556
+ year={2024},
557
+ eprint={2205.13147},
558
+ archivePrefix={arXiv},
559
+ primaryClass={cs.LG}
560
+ }
561
+ ```
562
+
563
+ #### MultipleNegativesRankingLoss
564
+ ```bibtex
565
+ @misc{henderson2017efficient,
566
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
567
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
568
+ year={2017},
569
+ eprint={1705.00652},
570
+ archivePrefix={arXiv},
571
+ primaryClass={cs.CL}
572
+ }
573
+ ```
574
+
575
+ <!--
576
+ ## Glossary
577
+
578
+ *Clearly define terms in order to be accessible across audiences.*
579
+ -->
580
+
581
+ <!--
582
+ ## Model Card Authors
583
+
584
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
585
+ -->
586
+
587
+ <!--
588
+ ## Model Card Contact
589
+
590
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
591
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98653bb5fcd983c0202a058c9ba16aa8d92c1d68573de50d2cc6fc47292077c0
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff