File size: 3,098 Bytes

74e8f2f

<!--
 @license
 Copyright Big Vision Authors

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Lit Demo App</title>

    <script src="./index.js"></script>

  </head>

  <body>

    <h1>LiT: Zero-Shot Transfer with Locked-image Tuning</h1>

    <p>
      This page is an interactive demo of the Google AI blog post
      <a target="_blank" href="http://ai.googleblog.com/2022/04/locked-image-tuning-adding-language.html"
      >LiT: adding language understanding to image models</a>
      – please refer to that page for a detailed explanation of how a LiT model works.
      If you're interested in how this demo makes a JAX model run on device in your
      browser, check out our other blog post
      <a target="_blank" href="https://blog.tensorflow.org/2022/08/jax-on-web-with-tensorflowjs.html"
      >JAX on the Web with TensorFlow.js</a>.
    </p>

    <p>
      Below you can choose an image from a selection and then write free-form
      text prompts that are matched to the image. Once you hit return on your
      keyboard or press the "compute" button, a text encoder implemented in
      <a target="_blank" href="https://www.tensorflow.org/js/">TensorFlow.js</a>
      will compute embeddings for the provided text on your local device, and the
      similarity of these text embeddings to the image embedding will be displayed.
    </p>

    <p>
      The prompts can be used to classify an image into multiple categories, listing
      each category individually with a prompt "an image of a X". But you can also
      probe the model interactively with more detailed prompts, comparing the
      different results when small details change in the text.
    </p>

    <p>
      Please use this demo responsibly. The models will always compare the image to
      the prompts you provide, and it is therefore trivial to construct situations
      where the model picks from a bunch of bad options.
    </p>

    <p class="note warning">
      <b>Note:</b>
      The models available in this interactive demo are <b>not</b> those from the
      <a target="_blank" href="https://arxiv.org/abs/2111.07991"
      >paper</a>.
      We had to train much smaller text towers and tokenizers to avoid
      overloading your browser. Please see
      <a target="_blank" href="https://github.com/google-research/vision_transformer"
      >our GitHub repository</a>
      for the models from the paper pre-trained on public datasets.
      Multilingual models coming soon.
    </p>

    <lit-demo-app></lit-demo-app>
  </body>
</html>