kokuma commited on
Commit
213ba4a
·
verified ·
1 Parent(s): bddca11

Remove info

Browse files
Files changed (1) hide show
  1. app.py +3 -63
app.py CHANGED
@@ -1286,71 +1286,11 @@ with gr.Blocks(title="Babel-ImageNet Quiz") as demo:
1286
  # Title Area
1287
  gr.Markdown(
1288
  """
1289
- # Are you smarter🤓 than CLIP🤖? Take the [ Babel-ImageNet ](https://arxiv.org/abs/2306.08658) Quiz!
1290
-
1291
- <small>by Gregor Geigle, WüNLP & Computer Vision Lab, University of Würzburg</small>
1292
-
1293
- In this quiz, you play against a CLIP model (specifically: [mSigLIP](https://huggingface.co/timm/ViT-B-16-SigLIP-i18n-256), a multilingual [SigLIP](https://arxiv.org/abs/2303.15343) model) and try to correctly classify the images over the 1000 ImageNet classes (in English) or over our (partial) Babel-ImageNet translations of those classes.
1294
- Select your language, click 'Start' and start guessing! We'll keep track of your score and of your opponent's.
1295
- > **Disclaimer:** Translations and images are derived automatically and can be wrong, unusual, or mismatch! This is supposed to be a fun game to explore the dataset and see how a CLIP model would answer the questions and not a product.
1296
- > We do *not* use the official ImageNet images. Instead, we use images linked in BabelNet for each class, which are often from Wikipedia and have not been checked for suitability.
1297
-
1298
- > **Content Warning:** There are spiders, insects, and various animals under the images. Please take caution if those might scare you.
1299
-
1300
- <details>
1301
- <summary> <b> FAQ</b> (click me to read)</summary>
1302
- <p><b>'Over 1000 classes? I just see 4.'</b> True, you have it easier and you only have to chose between 4 classes. These are the top-4 picks of your opponent (+ the correct class if they are wrong). Your opponent has it harder: they have to deal with all classes.</p>
1303
- <p><b>'Who is my opponent?'</b> Your opponent CLIP model is [mSigLIP](https://huggingface.co/timm/ViT-B-16-SigLIP-i18n-256), a powerful but small multilingual model with only 370M parameters.</p>
1304
- <p><b>'My game crashed/ I got an error!'</b> This usually happens because of problems with the image URLs. You can try the button to reroll the image or start a new round by clicking the 'Start' button again.</p>
1305
- </details>
1306
  """
1307
  )
1308
- with gr.Row():
1309
- with gr.Column(scale=1):
1310
- gr.Markdown(
1311
- """
1312
- <details>
1313
- <summary> <b>What is CLIP? </b> (click me to read)</summary>
1314
- <p>
1315
- <a href='https://arxiv.org/abs/2103.00020'>CLIP</a> are vision-language models that learn to encode images and text in a joint semantic embedding space, where related concepts are close together.
1316
- With CLIP, you can search through, filter, or group large image datasets. The image encoder in CLIP also powers many of the large vision language models like Llava 1.5!
1317
- </p>
1318
- <p>
1319
- Your opponent CLIP model [mSigLIP](https://arxiv.org/abs/2303.15343) in this quiz does 'zero-shot image classification': We encode all possible class labels and the image and we check which class is most similar; this is then the class chosen by CLIP.
1320
- </p>
1321
- </details>
1322
- """
1323
- )
1324
- with gr.Column(scale=1):
1325
- gr.Markdown(
1326
- """
1327
- <details>
1328
- <summary> <b>What is ImageNet? </b> (click me to read)</summary>
1329
- <p>
1330
- ImageNet is a challenging image classification dataset with 1000 diverse classes covering animals, plants, human-made objects and more.
1331
- It is a very popular dataset used to benchmark CLIP models because strong results here usually indicates that the image model is overall usefull for many tasks.
1332
- </p>
1333
- </details>
1334
- """
1335
- )
1336
- with gr.Column(scale=1):
1337
- gr.Markdown(
1338
- """
1339
- <details>
1340
- <summary> <b>What is Babel-ImageNet? </b> (click me to read)</summary>
1341
- <p>
1342
- ImageNet class labels are only in English but we want to use CLIP models also in other languages. How can we know how good a CLIP model is outside of English?
1343
- This is the goal of Babel-ImageNet: to translate the English labels to other languages. However, automatic translation can give bad results for many languages and human translation is expensive.
1344
- </p>
1345
- <p>
1346
- Instead, we use the fact that ImageNet was constructed using WordNet and WordNet in turn can be linked to the multilingual resource BabelNet.
1347
- Using this link, we can get reliable (partial) translations of the English labels.
1348
- For more details, please read our <a href='https://arxiv.org/abs/2306.08658'>paper.</a>
1349
- </p>
1350
- </details>
1351
- """
1352
- )
1353
- # language select dropdown
1354
  with gr.Row():
1355
  # language_select = gr.Dropdown(
1356
  # choices=main_language_values,
 
1286
  # Title Area
1287
  gr.Markdown(
1288
  """
1289
+ # Are you smarter🤓 than CLIP🤖?
1290
+ <small>adapted from the original code by Gregor Geigle, WüNLP & Computer Vision Lab, University of Würzburg</small>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1291
  """
1292
  )
1293
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1294
  with gr.Row():
1295
  # language_select = gr.Dropdown(
1296
  # choices=main_language_values,