elplaguister commited on
Commit
4c73846
·
verified ·
1 Parent(s): 3335897

Add Model card

Browse files
Files changed (1) hide show
  1. README.md +47 -5
README.md CHANGED
@@ -1,8 +1,50 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - ko
5
- base_model:
6
- - intfloat/multilingual-e5-large-instruct
7
- - FacebookAI/xlm-roberta-large
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - ko
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - transformers
9
+ ---
10
+
11
+ ## PwC-Embedding-expr
12
+
13
+ We trained the **PwC-Embedding-expr** model on top of the [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) embedding model.
14
+ To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets.
15
+
16
+
17
+ ### To-do
18
+ - [ ] MTEB Leaderboard
19
+ - [ ] Technical Report
20
+
21
+
22
+ ## MTEB
23
+ PwC-Embedding_expr was evaluated on the Korean subset of MTEB.
24
+ A leaderboard link will be added once it is published.
25
+
26
+ | Task | PwC-Embedding_expr | multilingual-e5-large | Max Result |
27
+ |------------------|--------------------|-----------------------|------------|
28
+ | KLUE-STS | 0.88 | 0.83 | 0.90 |
29
+ | KLUE-TC | 0.73 | 0.61 | 0.73 |
30
+ | Ko-StrategyQA | 0.80 | 0.80 | 0.83 |
31
+ | KorSTS | 0.84 | 0.81 | 0.98 |
32
+ | MIRACL-Reranking | 0.72 | 0.65 | 0.72 |
33
+ | MIRACL-Retrieval | 0.65 | 0.59 | 0.72 |
34
+ | **Average** | **0.77** | 0.71 | 0.81 |
35
+
36
+
37
+ ## Model
38
+ - Base Model: [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
39
+ - Model Size: 0.56B
40
+ - Embedding Dimension: 1024
41
+ - Max Input Tokens: 514
42
+
43
+
44
+ ## Requirements
45
+ It works with the dependencies included in the latest version of MTEB.
46
+
47
+
48
+ ## Citation
49
+
50
+ TBD (technical report expected September 2025)