Babel-ImageNet-Quiz

Sleeping

App Files Files Community

kokuma commited on Jul 5, 2024

Commit

213ba4a

verified ·

1 Parent(s): bddca11

Remove info

Browse files

Files changed (1) hide show

app.py +3 -63

app.py CHANGED Viewed

@@ -1286,71 +1286,11 @@ with gr.Blocks(title="Babel-ImageNet Quiz") as demo:
     # Title Area
     gr.Markdown(
         """
-# Are you smarter🤓 than CLIP🤖? Take the [ Babel-ImageNet ](https://arxiv.org/abs/2306.08658)  Quiz!
-<small>by Gregor Geigle, WüNLP & Computer Vision Lab, University of Würzburg</small>
-In this quiz, you play against a CLIP model (specifically: [mSigLIP](https://huggingface.co/timm/ViT-B-16-SigLIP-i18n-256), a multilingual [SigLIP](https://arxiv.org/abs/2303.15343) model) and try to correctly classify the images over the 1000 ImageNet classes (in English) or over our (partial) Babel-ImageNet translations of those classes.
-Select your language, click 'Start' and start guessing! We'll keep track of your score and of your opponent's.
-> **Disclaimer:** Translations and images are derived automatically and can be wrong, unusual, or mismatch! This is supposed to be a fun game to explore the dataset and see how a CLIP model would answer the questions and not a product.
-> We do *not* use the official ImageNet images. Instead, we use images linked in BabelNet for each class, which are often from Wikipedia and have not been checked for suitability.
-> **Content Warning:** There are spiders, insects, and various animals under the images. Please take caution if those might scare you.
-<details>
-<summary> <b> FAQ</b> (click me to read)</summary>
-<p><b>'Over 1000 classes? I just see 4.'</b> True, you have it easier and you only have to chose between 4 classes. These are the top-4 picks of your opponent (+ the correct class if they are wrong). Your opponent has it harder: they have to deal with all classes.</p>
-<p><b>'Who is my opponent?'</b> Your opponent CLIP model is [mSigLIP](https://huggingface.co/timm/ViT-B-16-SigLIP-i18n-256), a powerful but small multilingual model with only 370M parameters.</p>
-<p><b>'My game crashed/ I got an error!'</b> This usually happens because of problems with the image URLs. You can try the button to reroll the image or start a new round by clicking the 'Start' button again.</p>
-</details>
         """
     )
-    with gr.Row():
-        with gr.Column(scale=1):
-            gr.Markdown(
-                """
-            <details>
-            <summary> <b>What is CLIP? </b> (click me to read)</summary>
-            <p>
-            <a href='https://arxiv.org/abs/2103.00020'>CLIP</a> are vision-language models that learn to encode images and text in a joint semantic embedding space, where related concepts are close together.
-            With CLIP, you can search through, filter, or group large image datasets. The image encoder in CLIP also powers many of the large vision language models like Llava 1.5!
-            </p>
-            <p>
-            Your opponent CLIP model [mSigLIP](https://arxiv.org/abs/2303.15343) in this quiz does 'zero-shot image classification': We encode all possible class labels and the image and we check which class is most similar; this is then the class chosen by CLIP.
-            </p>
-            </details>
-                    """
-            )
-        with gr.Column(scale=1):
-            gr.Markdown(
-                """
-            <details>
-            <summary> <b>What is ImageNet? </b> (click me to read)</summary>
-            <p>
-            ImageNet is a challenging image classification dataset with 1000 diverse classes covering animals, plants, human-made objects and more.
-            It is a very popular dataset used to benchmark CLIP models because strong results here usually indicates that the image model is overall usefull for many tasks.
-            </p>
-            </details>
-                    """
-            )
-        with gr.Column(scale=1):
-            gr.Markdown(
-                """
-            <details>
-            <summary> <b>What is Babel-ImageNet? </b> (click me to read)</summary>
-            <p>
-            ImageNet class labels are only in English but we want to use CLIP models also in other languages. How can we know how good a CLIP model is outside of English?
-            This is the goal of Babel-ImageNet: to translate the English labels to other languages. However, automatic translation can give bad results for many languages and human translation is expensive.
-            </p>
-            <p>
-            Instead, we use the fact that ImageNet was constructed using WordNet and WordNet in turn can be linked to the multilingual resource BabelNet.
-            Using this link, we can get reliable (partial) translations of the English labels.
-            For more details, please read our <a href='https://arxiv.org/abs/2306.08658'>paper.</a>
-            </p>
-            </details>
-                    """
-            )
-    # language select dropdown
     with gr.Row():
         # language_select = gr.Dropdown(
         #     choices=main_language_values,

     # Title Area
     gr.Markdown(
         """
+# Are you smarter🤓 than CLIP🤖?
+<small>adapted from the original code by Gregor Geigle, WüNLP & Computer Vision Lab, University of Würzburg</small>
         """
     )
     with gr.Row():
         # language_select = gr.Dropdown(
         #     choices=main_language_values,