frozenc commited on
Commit
6bfce3d
·
verified ·
1 Parent(s): ad55950

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -13
README.md CHANGED
@@ -10,32 +10,31 @@ tags:
10
  - colpali
11
  - multimodal-embedding
12
  ---
13
- ### Ops-MM-embedding-v1-2B
14
 
15
  **Ops-MM-embedding-v1-2B** is a dense, large-scale multimodal embedding model developed and open-sourced by the Alibaba Cloud OpenSearch-AI team, fine-tuned from Qwen2-VL.
16
 
17
 
18
- ### **Key Features**
19
 
20
- #### Unified Multimodal Embeddings
21
  - Encodes text, images, text-image pairs, visual documents, and videos (by treating video frames as multiple image inputs) into a unified embedding space for cross-modal retrieval.
22
 
23
- #### High Performance on MMEB
24
  - Achieves **SOTA results** among models of similar scale on **MMEB-V2** and **MMEB-Image** benchmark (until 2025-07-03).
25
 
26
- #### Multilingual Capabilities
27
  - The larger variant (**Ops-MM-embedding-v1-7B**) achieves SOTA performance among dense models on the ViDoRe-v2 benchmark, demonstrating strong cross-lingual generalization.
28
 
29
 
30
-
31
- ### Training data
32
 
33
  MMEB-train, CC-3M, colpali training set.
34
 
35
 
36
- ### Performance
37
 
38
- #### MMEB-V2
39
 
40
  | Model | Model Size (B) | Overall | Image-Overall | Video-Overall | Visdoc-Overall |
41
  | ------------------------ | -------------- | ------- | ------------- | ------------- | -------------- |
@@ -46,8 +45,7 @@ MMEB-train, CC-3M, colpali training set.
46
  | gme-Qwen2-VL-2B-Instruct | 2.21 | 54.37 | 51.89 | 33.86 | 73.47 |
47
 
48
 
49
-
50
- #### MMEB-Image
51
 
52
  The table below compares performance on MMEB-Image benchmark among models of similar size.
53
 
@@ -58,8 +56,7 @@ The table below compares performance on MMEB-Image benchmark among models of sim
58
  | LLaVE-2B | 1.95 | 65.2 | 62.1 | 60.2 | 65.2 | 84.9 |
59
 
60
 
61
-
62
- #### ViDoRe-v2
63
 
64
  | Model | Avg | ESG Restaurant Human | MIT Bio | Econ. Macro | ESG Restaurant Synth. | MIT Bio Multi. | Econ Macro Multi. | ESG Restaurant Synth. Multi. |
65
  | ---------------------- | -------- | -------------------- | ------- | ----------- | --------------------- | -------------- | ----------------- | ---------------------------- |
 
10
  - colpali
11
  - multimodal-embedding
12
  ---
13
+ # Ops-MM-embedding-v1-2B
14
 
15
  **Ops-MM-embedding-v1-2B** is a dense, large-scale multimodal embedding model developed and open-sourced by the Alibaba Cloud OpenSearch-AI team, fine-tuned from Qwen2-VL.
16
 
17
 
18
+ ## **Key Features**
19
 
20
+ ### Unified Multimodal Embeddings
21
  - Encodes text, images, text-image pairs, visual documents, and videos (by treating video frames as multiple image inputs) into a unified embedding space for cross-modal retrieval.
22
 
23
+ ### High Performance on MMEB
24
  - Achieves **SOTA results** among models of similar scale on **MMEB-V2** and **MMEB-Image** benchmark (until 2025-07-03).
25
 
26
+ ### Multilingual Capabilities
27
  - The larger variant (**Ops-MM-embedding-v1-7B**) achieves SOTA performance among dense models on the ViDoRe-v2 benchmark, demonstrating strong cross-lingual generalization.
28
 
29
 
30
+ ## Training data
 
31
 
32
  MMEB-train, CC-3M, colpali training set.
33
 
34
 
35
+ ## Performance
36
 
37
+ ### MMEB-V2
38
 
39
  | Model | Model Size (B) | Overall | Image-Overall | Video-Overall | Visdoc-Overall |
40
  | ------------------------ | -------------- | ------- | ------------- | ------------- | -------------- |
 
45
  | gme-Qwen2-VL-2B-Instruct | 2.21 | 54.37 | 51.89 | 33.86 | 73.47 |
46
 
47
 
48
+ ### MMEB-Image
 
49
 
50
  The table below compares performance on MMEB-Image benchmark among models of similar size.
51
 
 
56
  | LLaVE-2B | 1.95 | 65.2 | 62.1 | 60.2 | 65.2 | 84.9 |
57
 
58
 
59
+ ### ViDoRe-v2
 
60
 
61
  | Model | Avg | ESG Restaurant Human | MIT Bio | Econ. Macro | ESG Restaurant Synth. | MIT Bio Multi. | Econ Macro Multi. | ESG Restaurant Synth. Multi. |
62
  | ---------------------- | -------- | -------------------- | ------- | ----------- | --------------------- | -------------- | ----------------- | ---------------------------- |