MTTR commited on
Commit
fe59327
·
1 Parent(s): 65c68b1

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -19
app.py CHANGED
@@ -1,21 +1,4 @@
1
  # -*- coding: utf-8 -*-
2
- """
3
- End-to-End Referring Video Object Segmentation with Multimodal Transformers
4
-
5
- This notebook provides a (limited) hands-on demonstration of MTTR.
6
-
7
- Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video.
8
-
9
-
10
- ### Disclaimer
11
- This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever.
12
- Hence, the model's performance may be limited, especially on instances from unseen categories.
13
-
14
- Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately).
15
-
16
- Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed.
17
- """
18
-
19
  import gradio as gr
20
  import torch
21
  import torchvision
@@ -153,9 +136,9 @@ def process(text_query, full_video_path):
153
 
154
  title = "End-to-End Referring Video Object Segmentation with Multimodal Transformers - Interactive Demo"
155
 
156
- description = "This notebook provides a (limited) hands-on demonstration of MTTR. Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. To use it, upload an .mp4 video file and enter a text query which describes one of the object instances in that video."
157
 
158
- article = "Check out [MTTR's GitHub page](https://github.com/mttr2021/MTTR) for more info about this project. <br> Also, check out our [Colab notebool](https://gradio.app/docs/) for much faster processing (GPU accelerated) and more options! <br> **Disclaimer:** <br> This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever. Hence, the model's performance may be limited, especially on instances from unseen categories. <br> Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately). <br> Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed. <br> <p style='text-align: center'><a href='https://github.com/mttr2021/MTTR'>Github Repo</a></p>"
159
 
160
  examples = [['guy in white shirt performing tricks on a bike', 'bike_tricks_2.mp4'],
161
  ['a man riding a surfboard', 'surfing.mp4'],
 
1
  # -*- coding: utf-8 -*-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  import gradio as gr
3
  import torch
4
  import torchvision
 
136
 
137
  title = "End-to-End Referring Video Object Segmentation with Multimodal Transformers - Interactive Demo"
138
 
139
+ description = "Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. Select one of the examples below and click 'submit'. Alternatively, try your own input by uploading a short .mp4 video file and enter a text query which describes one of the object instances in that video. Due to HuggingFace's limited computational resources (no GPU acceleration unfortunately), processing times may take several minutes, so please be patient. Check out our Colab notebook (link below) for much faster processing times (GPU acceleration available) and more options."
140
 
141
+ article = "Check out [MTTR's GitHub page](https://github.com/mttr2021/MTTR) for more info about this project. <br> Also, check out our interactive [Colab notebook](https://gradio.app/docs/) for **much faster** processing (GPU accelerated) and more options! <br> **Disclaimer:** <br> This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever. Hence, the model's performance may be limited, especially on instances from unseen categories. <br> Additionally, slow processing times may be encountered due to HuggingFace's limited computational resources (no GPU acceleration unfortunately), and depending on the input clip length and/or resolution. <br> Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed."
142
 
143
  examples = [['guy in white shirt performing tricks on a bike', 'bike_tricks_2.mp4'],
144
  ['a man riding a surfboard', 'surfing.mp4'],