Spaces:
Sleeping
Sleeping
improve gradio blocks interface
Browse files
app.py
CHANGED
@@ -12,12 +12,8 @@ import matplotlib.patches as patches
|
|
12 |
from matplotlib.patches import Polygon
|
13 |
import numpy as np
|
14 |
import random
|
15 |
-
import json
|
16 |
|
17 |
|
18 |
-
with open("config.json", "r") as f:
|
19 |
-
config = json.load(f)
|
20 |
-
|
21 |
d_model = config['text_config']['d_model']
|
22 |
num_layers = config['text_config']['encoder_layers']
|
23 |
attention_heads = config['text_config']['encoder_attention_heads']
|
@@ -32,10 +28,15 @@ temporal_embeddings = config['vision_config']['visual_temporal_embedding']['max_
|
|
32 |
|
33 |
title = """# 🙋🏻♂️Welcome to Tonic's PLeIAs/📸📈✍🏻Florence-PDF"""
|
34 |
description = """
|
35 |
-
---
|
36 |
-
|
37 |
This application showcases the **PLeIAs/📸📈✍🏻Florence-PDF** model, a powerful AI system designed for both **text and image generation tasks**. The model is capable of handling complex tasks such as object detection, image captioning, OCR (Optical Character Recognition), and detailed region-based image analysis.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
### **How to Use**:
|
40 |
1. **Upload an Image**: Select an image for processing.
|
41 |
2. **Choose a Task**: Pick a task from the dropdown menu, such as "Caption", "Object Detection", "OCR", etc.
|
@@ -50,8 +51,6 @@ You can reset the interface anytime by clicking the **Reset** button.
|
|
50 |
- **📸✍🏻OCR**: Extract text from the image.
|
51 |
- **📸Region Proposal**: Detect key regions in the image for detailed captioning.
|
52 |
|
53 |
-
---
|
54 |
-
|
55 |
### Join us :
|
56 |
🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗
|
57 |
"""
|
@@ -77,12 +76,6 @@ In addition to text tasks, 🙏🏻PLeIAs/📸📈✍🏻Florence-PDF also incor
|
|
77 |
- **Patch-based Image Processing**: The vision component operates on image patches with a patch size of **{patch_size}x{patch_size}**.
|
78 |
- **Temporal Embedding**: Visual tasks benefit from temporal embeddings with up to **{temporal_embeddings} steps**, making Florence-2 well-suited for video analysis.
|
79 |
|
80 |
-
### Model Usage and Flexibility
|
81 |
-
|
82 |
-
- **No Repeat N-Grams**: To reduce repetition in text generation, the model is configured with a **no_repeat_ngram_size** of **{no_repeat_ngram_size}**, ensuring more diverse and meaningful outputs.
|
83 |
-
- **Sampling Strategies**: 🙏🏻PLeIAs/📸📈✍🏻Florence-PDF offers flexible sampling strategies, including **top-k** and **top-p (nucleus) sampling**, allowing for both creative and constrained generation based on user needs.
|
84 |
-
|
85 |
-
📸📈✍🏻Florence-PDF is a robust model capable of handling various **text and image** tasks with high precision and flexibility, making it a valuable tool for both academic research and practical applications.
|
86 |
"""
|
87 |
|
88 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
12 |
from matplotlib.patches import Polygon
|
13 |
import numpy as np
|
14 |
import random
|
|
|
15 |
|
16 |
|
|
|
|
|
|
|
17 |
d_model = config['text_config']['d_model']
|
18 |
num_layers = config['text_config']['encoder_layers']
|
19 |
attention_heads = config['text_config']['encoder_attention_heads']
|
|
|
28 |
|
29 |
title = """# 🙋🏻♂️Welcome to Tonic's PLeIAs/📸📈✍🏻Florence-PDF"""
|
30 |
description = """
|
|
|
|
|
31 |
This application showcases the **PLeIAs/📸📈✍🏻Florence-PDF** model, a powerful AI system designed for both **text and image generation tasks**. The model is capable of handling complex tasks such as object detection, image captioning, OCR (Optical Character Recognition), and detailed region-based image analysis.
|
32 |
|
33 |
+
### Model Usage and Flexibility
|
34 |
+
|
35 |
+
- **No Repeat N-Grams**: To reduce repetition in text generation, the model is configured with a **no_repeat_ngram_size** of **{no_repeat_ngram_size}**, ensuring more diverse and meaningful outputs.
|
36 |
+
- **Sampling Strategies**: 🙏🏻PLeIAs/📸📈✍🏻Florence-PDF offers flexible sampling strategies, including **top-k** and **top-p (nucleus) sampling**, allowing for both creative and constrained generation based on user needs.
|
37 |
+
|
38 |
+
📸📈✍🏻Florence-PDF is a robust model capable of handling various **text and image** tasks with high precision and flexibility, making it a valuable tool for both academic research and practical applications.
|
39 |
+
|
40 |
### **How to Use**:
|
41 |
1. **Upload an Image**: Select an image for processing.
|
42 |
2. **Choose a Task**: Pick a task from the dropdown menu, such as "Caption", "Object Detection", "OCR", etc.
|
|
|
51 |
- **📸✍🏻OCR**: Extract text from the image.
|
52 |
- **📸Region Proposal**: Detect key regions in the image for detailed captioning.
|
53 |
|
|
|
|
|
54 |
### Join us :
|
55 |
🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗
|
56 |
"""
|
|
|
76 |
- **Patch-based Image Processing**: The vision component operates on image patches with a patch size of **{patch_size}x{patch_size}**.
|
77 |
- **Temporal Embedding**: Visual tasks benefit from temporal embeddings with up to **{temporal_embeddings} steps**, making Florence-2 well-suited for video analysis.
|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
"""
|
80 |
|
81 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|