qualcomm
/

OpenAI-Clip

@@ -16,7 +16,7 @@ tags:
 Contrastive Language-Image Pre-Training (CLIP) uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features can then be used for a variety of zero-shot learning tasks.
-This model is an implementation of Posenet-Mobilenet found [here](https://github.com/openai/CLIP/).
 This repository provides scripts to run OpenAI-Clip on Qualcomm® devices.

 Contrastive Language-Image Pre-Training (CLIP) uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features can then be used for a variety of zero-shot learning tasks.
+This model is an implementation of OpenAI-Clip found [here](https://github.com/openai/CLIP/).
 This repository provides scripts to run OpenAI-Clip on Qualcomm® devices.