pranavSIT
/

PaliOpenVocabSegmentation

Model card Files Files and versions

PaliOpenVocabSegmentation / big_vision /tools /lit_demo /src /index.html

pranavSIT's picture

added pali inference

74e8f2f about 1 year ago

history blame contribute delete

3.1 kB

	<!--
	@license
	Copyright Big Vision Authors

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<!doctype html>
	<html>
	<head>
	<meta charset="utf-8">
	<title>Lit Demo App</title>

	<script src="./index.js"></script>

	</head>

	<body>

	<h1>LiT: Zero-Shot Transfer with Locked-image Tuning</h1>

	<p>
	This page is an interactive demo of the Google AI blog post
	<a target="_blank" href="http://ai.googleblog.com/2022/04/locked-image-tuning-adding-language.html"
	>LiT: adding language understanding to image models</a>
	– please refer to that page for a detailed explanation of how a LiT model works.
	If you're interested in how this demo makes a JAX model run on device in your
	browser, check out our other blog post
	<a target="_blank" href="https://blog.tensorflow.org/2022/08/jax-on-web-with-tensorflowjs.html"
	>JAX on the Web with TensorFlow.js</a>.
	</p>

	<p>
	Below you can choose an image from a selection and then write free-form
	text prompts that are matched to the image. Once you hit return on your
	keyboard or press the "compute" button, a text encoder implemented in
	<a target="_blank" href="https://www.tensorflow.org/js/">TensorFlow.js</a>
	will compute embeddings for the provided text on your local device, and the
	similarity of these text embeddings to the image embedding will be displayed.
	</p>

	<p>
	The prompts can be used to classify an image into multiple categories, listing
	each category individually with a prompt "an image of a X". But you can also
	probe the model interactively with more detailed prompts, comparing the
	different results when small details change in the text.
	</p>

	<p>
	Please use this demo responsibly. The models will always compare the image to
	the prompts you provide, and it is therefore trivial to construct situations
	where the model picks from a bunch of bad options.
	</p>

	<p class="note warning">
	<b>Note:</b>
	The models available in this interactive demo are <b>not</b> those from the
	<a target="_blank" href="https://arxiv.org/abs/2111.07991"
	>paper</a>.
	We had to train much smaller text towers and tokenizers to avoid
	overloading your browser. Please see
	<a target="_blank" href="https://github.com/google-research/vision_transformer"
	>our GitHub repository</a>
	for the models from the paper pre-trained on public datasets.
	Multilingual models coming soon.
	</p>

	<lit-demo-app></lit-demo-app>
	</body>
	</html>