Nomic Embeddings API vs Transformers output

by rajaiswal - opened Jun 8

Jun 8

I have this code:

    import torch.nn.functional as F
    from transformers import AutoModel, AutoImageProcessor
    from PIL import Image

    img = Image.open("./image.jpg")

    processor = AutoImageProcessor.from_pretrained("nomic-ai/nomic-embed-vision-v1")
    vision_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-vision-v1",
                                             trust_remote_code=True)

    inputs = processor([img], return_tensors="pt")

    img_emb = vision_model(**inputs).last_hidden_state
    img_embeddings = F.normalize(img_emb[:, 0], p=2, dim=1)
    print(img_embeddings[0][:5])

This prints: tensor([-0.0672, -0.0483, -0.0122, -0.0547, -0.0542], grad_fn=<SliceBackward0>)

    import nomic
    from PIL import Image
    from nomic import embed

    nomic.cli.login(NOMIC_API_KEY)

    output = embed.image(
        images=[img],
        model='nomic-embed-vision-v1',
    )

    print(output["embeddings"][0][:5])

This prints: [-0.06616211, -0.072265625, 0.002506256, -0.05718994, -0.04675293]

Both should produce vectors with similar values, right? What am I missing? And like the outputs from nomic case how to we get values with more precision for the transformers case?

zpn

Nomic AI org Jun 8

hmm thanks for raising this. when i deployed the models they had equivalent outputs but seems like something’s gone wrong. i will investigate and get back to you asap

zpn

Nomic AI org Jun 8

The only immediate thing I would check is if you get similar values for running the transformers model in fp16. the model we have running in production is run with that same precision

rajaiswal

Jun 8

@zpn Thanks for the quick response. I tried what you suggested, it did not make the output similar nor did it increase precision

zpn

Nomic AI org Jun 8

Ok, will dig in! apologies for this

zpn

Nomic AI org Jun 8

@rajaiswal ok i believe i've identified an issue. it seems like something is going on when we upload the image via bytes to our API. trying to debug a bit more. a workaround (if possible) is to instead pass urls to the API and you should see similar results. The error should be around

np.abs(emb - nom_emb).min()=0.0
np.abs(emb - nom_emb).mean()=6.73e-05
np.abs(emb - nom_emb).max()=0.0003052

rajaiswal

Jun 8

@zpn Thanks for working on this. Tried your suggestion, yep I am getting same results if I pass in url to the Nomic API. Does that mean the issue is on the Nomic API side and not on the transformers side?

zpn

Nomic AI org Jun 8

•

edited Jun 8

do you mind sharing the image so I can test? it appears there may be a bug in our api, we’re working to fix it asap

rajaiswal

Jun 8

@zpn What I meant was - Yes if I use the image_url then both the transformers and API produce the same results.
transformers output: tensor([-0.0201, 0.0056, -0.0255, -0.0168, -0.0528], dtype=torch.float16, grad_fn=<SliceBackward0>)
API output: [-0.020095825, 0.0056610107, -0.025756836, -0.016479492, -0.052612305]
Here is the image I am testing with url: https://m.media-amazon.com/images/M/MV5BZTc0ZjNkYTktMmJmOS00OTJlLTg1NWUtMzQ5ZGMxM2NhY2M0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_.jpg

But does this mean that the bug lies within Nomic's hosted API and not with the transformers implementation?

zpn

Nomic AI org Jun 8

•

edited Jun 8

Oh! I realize it is a client side "bug". If you upload the image directly, we resize the image otherwise it'll be too big for the request: https://github.com/nomic-ai/nomic/blob/main/nomic/embed.py#L342

The API and the transformers should be equivalent. When I've tested for a fixed input, they return nearly identical values.

I think the optimal solution is to use the URLs if possible

rajaiswal

Jun 8

@zpn Okay that explains the difference. Thanks for investigating! Also any way we can get more precision using transformers like we get from the API? In the text model I get less precision using transformers but more precision using sentence_transformers. Something like that for vision model?

zpn changed discussion status to closed Sep 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment