Update README.md
Browse files
README.md
CHANGED
@@ -3604,8 +3604,7 @@ The `GME` models support three types of input: **text**, **image**, and **image-
|
|
3604 |
|
3605 |
**Key Enhancements of GME Models**:
|
3606 |
|
3607 |
-
- **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation.
|
3608 |
-
- This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches.
|
3609 |
- **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**).
|
3610 |
- **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
|
3611 |
- **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots.
|
@@ -3699,12 +3698,17 @@ We will extend to multi-image input, image-text interleaved data as well as mult
|
|
3699 |
|
3700 |
We encourage and value diverse applications of GME models and continuous enhancements to the models themselves.
|
3701 |
|
3702 |
-
- If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them,
|
3703 |
|
3704 |
-
|
3705 |
|
3706 |
-
|
3707 |
|
|
|
|
|
|
|
|
|
|
|
3708 |
|
3709 |
## Citation
|
3710 |
If you find our paper or models helpful, please consider cite:
|
|
|
3604 |
|
3605 |
**Key Enhancements of GME Models**:
|
3606 |
|
3607 |
+
- **Unified Multimodal Representation**: GME models can process both single-modal and combined-modal inputs, resulting in a unified vector representation. This enables versatile retrieval scenarios (Any2Any Search), supporting tasks such as text retrieval, image retrieval from text, and image-to-image searches.
|
|
|
3608 |
- **High Performance**: Achieves state-of-the-art (SOTA) results in our universal multimodal retrieval benchmark (**UMRB**) and demonstrate strong evaluation scores in the Multimodal Textual Evaluation Benchmark (**MTEB**).
|
3609 |
- **Dynamic Image Resolution**: Benefiting from `Qwen2-VL` and our training data, GME models support dynamic resolution image input.
|
3610 |
- **Strong Visual Retrieval Performance**: Enhanced by the Qwen2-VL model series, our models excel in visual document retrieval tasks that require a nuanced understanding of document screenshots.
|
|
|
3698 |
|
3699 |
We encourage and value diverse applications of GME models and continuous enhancements to the models themselves.
|
3700 |
|
3701 |
+
- If you distribute or make GME models (or any derivative works) available, or if you create a product or service (including another AI model) that incorporates them, you must prominently display `Built with GME` on your website, user interface, blog post, About page, or product documentation.
|
3702 |
|
3703 |
+
- If you utilize GME models or their outputs to develop, train, fine-tune, or improve an AI model that is distributed or made available, you must prefix the name of any such AI model with `GME`.
|
3704 |
|
3705 |
+
## Cloud API Services
|
3706 |
|
3707 |
+
In addition to the open-source [GME](https://huggingface.co/collections/Alibaba-NLP/gme-models) series models, GME series models are also available as commercial API services on Alibaba Cloud.
|
3708 |
+
|
3709 |
+
- [MultiModal Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): The `multimodal-embedding-v1` model service is available.
|
3710 |
+
|
3711 |
+
Note that the models behind the commercial APIs are not entirely identical to the open-source models.
|
3712 |
|
3713 |
## Citation
|
3714 |
If you find our paper or models helpful, please consider cite:
|