Nemat

Amin24
Β·

AI & ML interests

Applied Linguistics, Language Learning

Recent Activity

liked a model 6 days ago
sayakpaul/FLUX.1-dev-edit-v0
reacted to merve's post with πŸ‘ 20 days ago
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to πŸ€—transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects πŸ₯³ Search for art πŸ‘‰ https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP πŸ‘‰ https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. πŸ“Ž Highlights from the paper on why you should use it ✨ πŸ–ΌοΈπŸ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder 😍 More performant than CLIP on zero-shot πŸ—£οΈ Authors trained a multilingual model too! ⚑️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers πŸ‘‡ ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. πŸ€—
liked a model 5 months ago
XLabs-AI/flux-ip-adapter
View all activity

Organizations

None yet

Amin24's activity

reacted to merve's post with πŸ‘ 20 days ago
view post
Post
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to πŸ€—transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects πŸ₯³
Search for art πŸ‘‰ merve/draw_to_search_art
Compare SigLIP with CLIP πŸ‘‰ merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. πŸ“Ž

Highlights from the paper on why you should use it ✨
πŸ–ΌοΈπŸ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
😍 More performant than CLIP on zero-shot
πŸ—£οΈ Authors trained a multilingual model too!
⚑️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers πŸ‘‡
from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. πŸ€—
  • 2 replies
Β·
New activity in nvidia/parakeet-tdt-1.1b 12 months ago

Word timing

1
#2 opened 12 months ago by
Amin24