pranavSIT's picture
added pali inference
74e8f2f
<!--
@license
Copyright Big Vision Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Lit Demo App</title>
<script src="./index.js"></script>
</head>
<body>
<h1>LiT: Zero-Shot Transfer with Locked-image Tuning</h1>
<p>
This page is an interactive demo of the Google AI blog post
<a target="_blank" href="http://ai.googleblog.com/2022/04/locked-image-tuning-adding-language.html"
>LiT: adding language understanding to image models</a>
– please refer to that page for a detailed explanation of how a LiT model works.
If you're interested in how this demo makes a JAX model run on device in your
browser, check out our other blog post
<a target="_blank" href="https://blog.tensorflow.org/2022/08/jax-on-web-with-tensorflowjs.html"
>JAX on the Web with TensorFlow.js</a>.
</p>
<p>
Below you can choose an image from a selection and then write free-form
text prompts that are matched to the image. Once you hit return on your
keyboard or press the "compute" button, a text encoder implemented in
<a target="_blank" href="https://www.tensorflow.org/js/">TensorFlow.js</a>
will compute embeddings for the provided text on your local device, and the
similarity of these text embeddings to the image embedding will be displayed.
</p>
<p>
The prompts can be used to classify an image into multiple categories, listing
each category individually with a prompt "an image of a X". But you can also
probe the model interactively with more detailed prompts, comparing the
different results when small details change in the text.
</p>
<p>
Please use this demo responsibly. The models will always compare the image to
the prompts you provide, and it is therefore trivial to construct situations
where the model picks from a bunch of bad options.
</p>
<p class="note warning">
<b>Note:</b>
The models available in this interactive demo are <b>not</b> those from the
<a target="_blank" href="https://arxiv.org/abs/2111.07991"
>paper</a>.
We had to train much smaller text towers and tokenizers to avoid
overloading your browser. Please see
<a target="_blank" href="https://github.com/google-research/vision_transformer"
>our GitHub repository</a>
for the models from the paper pre-trained on public datasets.
Multilingual models coming soon.
</p>
<lit-demo-app></lit-demo-app>
</body>
</html>