Introduction

Based on dunzhang/stella_en_1.5B_v5 and google/siglip-so400m-patch14-384.

It can encode both text and images.

Report: https://arxiv.org/abs/2412.19048

Codes: https://github.com/NLPJCL/RAG-Retrieval

Data: https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset

Training logs: https://api.wandb.ai/links/dunnzhang0/z8jqoqpb

The core idea of jasper and stella is distillation: Let student model learn teacher model's vectors.

Usage

import torch
from sentence_transformers import SentenceTransformer


DOC1 = """
Blue light is scattered in all directions by the tiny molecules of air in Earth's atmosphere. 
Blue is scattered more than other colors because it travels as shorter, smaller waves. This is why we see a blue sky most of the time. 
Closer to the horizon, the sky fades to a lighter blue or white.
"""
DOC2 = """
When choosing colors, you can consider the following factors:
Color theory: Understand how colors work together and how they can evoke different reactions. 
Color psychology: Consider how colors affect emotions, behaviors, and responses. 
Brand identity: Colors can convey meaning and information about a brand. 
Mood: Consider the mood you want to create. For example, brighter colors can feel cheerful, while cooler colors can be calming.
Space: Consider the size of the space and the amount of natural light it receives. Dark colors can make a room feel smaller, while light colors can make it feel larger.
Color wheel: Use the color wheel to identify primary, secondary, and tertiary colors. 
Color combinations: Decide how to best complement your preferred color with others. 
Color palette: Limit your color palette to a main color and one or two additional colors. 
60-30-10 rule: Use a primary color 60% of the time, a secondary color 30% of the time, and an accent color 10% of the time
"""
if __name__ == "__main__":
    # load model
    use_gpu = False
    model_name = "infgrad/jasper_en_vision_language_v1"
    model = SentenceTransformer(
        model_name,
        trust_remote_code=True,
        device="cpu" if not use_gpu else "cuda",
        model_kwargs={
            "torch_dtype": torch.bfloat16 if use_gpu else torch.float32,
            "attn_implementation": "sdpa"
        },
        # vector_dim must be 12288, 1024, 512, 256
        ## 1024 is recommended
        # set is_text_encoder 'True', if you do not encode image
        config_kwargs={"is_text_encoder": False, "vector_dim": 1024},
    )
    # We can reduce the max_seq_length from the default of 2048 for faster encoding
    model.max_seq_length = 1024

    # data
    q_list = [
        "Why the sky is blue?",
        "how to choose suitable color",
    ]
    doc_list = [
        DOC1,
        [{"type": "image_path", "content": "./assets/img1.png"}, {"type": "text", "content": "Hope this image helps!"}],
        DOC2,
        [{"type": "image_path", "content": "./assets/img2.png"}],
    ]
    q_vecs = model.encode(q_list, prompt_name="s2p_query")
    doc_vecs = model.encode(doc_list)

    # calculate similarity
    similarities = model.similarity(q_vecs, doc_vecs)
    print(similarities)
    # the output is:
    # tensor([[0.7775, 0.7594, 0.2429, 0.2187],
    #         [0.3226, 0.3054, 0.7421, 0.5484]])

Evaluation on MTEB

script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py

License

This model should not be used for any commercial purpose!

Citation


@misc{zhang2025jasperstelladistillationsota,
      title={Jasper and Stella: distillation of SOTA embedding models}, 
      author={Dun Zhang and Jiacheng Li and Ziyang Zeng and Fulong Wang},
      year={2025},
      eprint={2412.19048},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2412.19048}, 
}
Downloads last month
15,835
Safetensors
Model size
1.99B params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for NovaSearch/jasper_en_vision_language_v1

Finetuned
(14)
this model

Datasets used to train NovaSearch/jasper_en_vision_language_v1

Space using NovaSearch/jasper_en_vision_language_v1 1

Evaluation results