|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!doctype html> |
|
<html> |
|
<head> |
|
<meta charset="utf-8"> |
|
<title>Lit Demo App</title> |
|
|
|
<script src="./index.js"></script> |
|
|
|
</head> |
|
|
|
<body> |
|
|
|
<h1>LiT: Zero-Shot Transfer with Locked-image Tuning</h1> |
|
|
|
<p> |
|
This page is an interactive demo of the Google AI blog post |
|
<a target="_blank" href="http://ai.googleblog.com/2022/04/locked-image-tuning-adding-language.html" |
|
>LiT: adding language understanding to image models</a> |
|
– please refer to that page for a detailed explanation of how a LiT model works. |
|
If you're interested in how this demo makes a JAX model run on device in your |
|
browser, check out our other blog post |
|
<a target="_blank" href="https://blog.tensorflow.org/2022/08/jax-on-web-with-tensorflowjs.html" |
|
>JAX on the Web with TensorFlow.js</a>. |
|
</p> |
|
|
|
<p> |
|
Below you can choose an image from a selection and then write free-form |
|
text prompts that are matched to the image. Once you hit return on your |
|
keyboard or press the "compute" button, a text encoder implemented in |
|
<a target="_blank" href="https://www.tensorflow.org/js/">TensorFlow.js</a> |
|
will compute embeddings for the provided text on your local device, and the |
|
similarity of these text embeddings to the image embedding will be displayed. |
|
</p> |
|
|
|
<p> |
|
The prompts can be used to classify an image into multiple categories, listing |
|
each category individually with a prompt "an image of a X". But you can also |
|
probe the model interactively with more detailed prompts, comparing the |
|
different results when small details change in the text. |
|
</p> |
|
|
|
<p> |
|
Please use this demo responsibly. The models will always compare the image to |
|
the prompts you provide, and it is therefore trivial to construct situations |
|
where the model picks from a bunch of bad options. |
|
</p> |
|
|
|
<p class="note warning"> |
|
<b>Note:</b> |
|
The models available in this interactive demo are <b>not</b> those from the |
|
<a target="_blank" href="https://arxiv.org/abs/2111.07991" |
|
>paper</a>. |
|
We had to train much smaller text towers and tokenizers to avoid |
|
overloading your browser. Please see |
|
<a target="_blank" href="https://github.com/google-research/vision_transformer" |
|
>our GitHub repository</a> |
|
for the models from the paper pre-trained on public datasets. |
|
Multilingual models coming soon. |
|
</p> |
|
|
|
<lit-demo-app></lit-demo-app> |
|
</body> |
|
</html> |