Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
@@ -30,61 +30,27 @@ def predict(title, abstract):
|
|
30 |
with torch.no_grad():
|
31 |
outputs = model(**inputs)
|
32 |
probability = torch.sigmoid(outputs.logits).item()
|
33 |
-
# reason for +0.05: We observed that the predicted values in the web demo are generally around 0.05 lower than those in the local deployment (due to differences in software/hardware environments). Therefore, we applied the following compensation in the web demo. Please do not use this in the local deployment.
|
34 |
if probability + 0.05 >=1.0:
|
35 |
return round(1, 4)
|
36 |
return round(probability + 0.05, 4)
|
37 |
|
38 |
|
39 |
-
|
40 |
examples = [
|
41 |
[
|
42 |
-
"
|
43 |
-
('''
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
this
|
48 |
-
leverages compact low-rank experts to facilitate efficient all-in-one image
|
49 |
-
restoration. Specifically, LoRA-IR consists of two training stages:
|
50 |
-
degradation-guided pre-training and parameter-efficient fine-tuning. In the
|
51 |
-
pre-training stage, we enhance the pre-trained CLIP model by introducing a
|
52 |
-
simple mechanism that scales it to higher resolutions, allowing us to extract
|
53 |
-
robust degradation representations that adaptively guide the IR network. In the
|
54 |
-
fine-tuning stage, we refine the pre-trained IR network using low-rank
|
55 |
-
adaptation (LoRA). Built upon a Mixture-of-Experts (MoE) architecture, LoRA-IR
|
56 |
-
dynamically integrates multiple low-rank restoration experts through a
|
57 |
-
degradation-guided router. This dynamic integration mechanism significantly
|
58 |
-
enhances our model's adaptability to diverse and unknown degradations in
|
59 |
-
complex real-world scenarios. Extensive experiments demonstrate that LoRA-IR
|
60 |
-
achieves state-of-the-art performance across 14 image restoration tasks and 29
|
61 |
-
benchmarks. Code and pre-trained models will be available at:
|
62 |
-
https://github.com/shallowdream204/LoRA-IR.''')
|
63 |
],
|
64 |
[
|
65 |
-
"
|
66 |
-
|
67 |
-
While plausible appearance and talking effect are achieved, these methods still
|
68 |
-
suffer from temporal, 3D or expression inconsistency due to the error
|
69 |
-
accumulation and inherent limitation of single-image generation ability. In
|
70 |
-
this paper, we propose ConsistentAvatar, a novel framework for fully consistent
|
71 |
-
and high-fidelity talking avatar generation. Instead of directly employing
|
72 |
-
multi-modal conditions to the diffusion process, our method learns to first
|
73 |
-
model the temporal representation for stability between adjacent frames.
|
74 |
-
Specifically, we propose a Temporally-Sensitive Detail (TSD) map containing
|
75 |
-
high-frequency feature and contours that vary significantly along the time
|
76 |
-
axis. Using a temporal consistent diffusion module, we learn to align TSD of
|
77 |
-
the initial result to that of the video frame ground truth. The final avatar is
|
78 |
-
generated by a fully consistent diffusion module, conditioned on the aligned
|
79 |
-
TSD, rough head normal, and emotion prompt embedding. We find that the aligned
|
80 |
-
TSD, which represents the temporal patterns, constrains the diffusion process
|
81 |
-
to generate temporally stable talking head. Further, its reliable guidance
|
82 |
-
complements the inaccuracy of other conditions, suppressing the accumulated
|
83 |
-
error while improving the consistency on various aspects. Extensive experiments
|
84 |
-
demonstrate that ConsistentAvatar outperforms the state-of-the-art methods on
|
85 |
-
the generated appearance, 3D, expression and temporal consistency. Project
|
86 |
-
page: https://njust-yang.github.io/ConsistentAvatar.github.io/''')
|
87 |
]
|
|
|
88 |
]
|
89 |
|
90 |
def validate_input(title, abstract):
|
@@ -111,21 +77,21 @@ def update_button_status(title, abstract):
|
|
111 |
|
112 |
with gr.Blocks() as iface:
|
113 |
gr.Markdown("""
|
114 |
-
#
|
115 |
-
### Estimate the future academic impact
|
116 |
-
###### [
|
117 |
-
######
|
118 |
""")
|
119 |
with gr.Row():
|
120 |
with gr.Column():
|
121 |
title_input = gr.Textbox(
|
122 |
lines=2,
|
123 |
-
placeholder="Enter Paper Title Here...",
|
124 |
label="Paper Title"
|
125 |
)
|
126 |
abstract_input = gr.Textbox(
|
127 |
lines=5,
|
128 |
-
placeholder=
|
129 |
label="Paper Abstract"
|
130 |
)
|
131 |
validation_status = gr.Textbox(label="Validation Status", interactive=False)
|
@@ -133,15 +99,17 @@ with gr.Blocks() as iface:
|
|
133 |
with gr.Column():
|
134 |
output = gr.Label(label="Predicted Impact")
|
135 |
gr.Markdown("""
|
136 |
-
|
137 |
-
- It is intended as a tool for research and educational purposes only
|
138 |
-
-
|
139 |
-
-
|
140 |
-
-
|
141 |
-
-
|
|
|
|
|
142 |
""")
|
143 |
|
144 |
-
|
145 |
title_input.change(
|
146 |
update_button_status,
|
147 |
inputs=[title_input, abstract_input],
|
|
|
30 |
with torch.no_grad():
|
31 |
outputs = model(**inputs)
|
32 |
probability = torch.sigmoid(outputs.logits).item()
|
33 |
+
# reason for +0.05: We observed that the predicted values in the web demo are generally around 0.05 lower than those in the local deployment (due to differences in software/hardware environments, we believed). Therefore, we applied the following compensation in the web demo. Please do not use this in the local deployment.
|
34 |
if probability + 0.05 >=1.0:
|
35 |
return round(1, 4)
|
36 |
return round(probability + 0.05, 4)
|
37 |
|
38 |
|
39 |
+
|
40 |
examples = [
|
41 |
[
|
42 |
+
"Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection",
|
43 |
+
('''One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at this https URL.''')
|
44 |
+
],
|
45 |
+
[
|
46 |
+
"OminiControl: Minimal and Universal Control for Diffusion Transformer",
|
47 |
+
('''In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone and process them with its flexible multi-modal attention processors. Unlike existing methods, which rely heavily on additional encoder modules with complex architectures, OminiControl (1) effectively and efficiently incorporates injected image conditions with only ~0.1% additional parameters, and (2) addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions such as edges, depth, and more. Remarkably, these capabilities are achieved by training on images generated by the DiT itself, which is particularly beneficial for subject-driven generation. Extensive evaluations demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted models in both subject-driven and spatially-aligned conditional generation. Additionally, we release our training dataset, Subjects200K, a diverse collection of over 200,000 identity-consistent images, along with an efficient data synthesis pipeline to advance research in subject-consistent generation.''')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
],
|
49 |
[
|
50 |
+
"Enhanced ZSSR for Super-resolution Reconstruction of the Historical Tibetan Document Images",
|
51 |
+
"Due to the poor preservation and imaging conditions, the image quality of historical Tibetan document images is relatively unsatisfactory. In this paper, we adopt super-resolution technology to reconstruct high quality images of historical Tibetan document. To address the problem of low quantity and poor quality of historical Tibetan document images, we propose the EZSSR network based on the Zero-Shot Super-resolution Network (ZSSR), which borrows the idea of feature pyramid in Deep Laplacian Pyramid Networks (LapSRN) to extract different levels of features while alleviating the ringing artifacts. EZSSR neither requires paired training datasets nor preprocessing stage. The computational complexity of EZSSR is low, and thus, EZSSR can also reconstruct image within the acceptable time frame. Experimental results show that EZSSR reconstructs images with better visual effects and higher PSNR and SSIM values."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
]
|
53 |
+
|
54 |
]
|
55 |
|
56 |
def validate_input(title, abstract):
|
|
|
77 |
|
78 |
with gr.Blocks() as iface:
|
79 |
gr.Markdown("""
|
80 |
+
# 📈 Predict Academic Impact of Newly Published Paper!
|
81 |
+
### Estimate the future academic impact from the title and abstract with LLM.
|
82 |
+
###### [Full Paper](https://arxiv.org/abs/2408.03934)
|
83 |
+
###### Kindly note that ZeroGPU does not support preloading quantized models for now. Each time you click "Predict," the model will be reinitialized, which could take around 20 seconds.
|
84 |
""")
|
85 |
with gr.Row():
|
86 |
with gr.Column():
|
87 |
title_input = gr.Textbox(
|
88 |
lines=2,
|
89 |
+
placeholder="Enter Paper Title Here... (Title will be processed with 'title.replace("\n", " ").strip()')",
|
90 |
label="Paper Title"
|
91 |
)
|
92 |
abstract_input = gr.Textbox(
|
93 |
lines=5,
|
94 |
+
placeholder='''Enter Paper Abstract Here... (Abstract will be processed with 'abstract.replace("\n", " ").strip()')''',
|
95 |
label="Paper Abstract"
|
96 |
)
|
97 |
validation_status = gr.Textbox(label="Validation Status", interactive=False)
|
|
|
99 |
with gr.Column():
|
100 |
output = gr.Label(label="Predicted Impact")
|
101 |
gr.Markdown("""
|
102 |
+
## Ethical Warnings and Important Notes
|
103 |
+
- It is intended as a tool **for research and educational purposes only**.
|
104 |
+
- Please refrain from deliberately embellishing the title and abstract to boost scores, and avoid making false claims.
|
105 |
+
- Our training data only includes samples from the fields including cs.CV, cs.CL (NLP), and cs.AI. Predictions outside these areas are not recommended for reference.
|
106 |
+
- The **predicted value** is a probability generated by the model and **does NOT reflect paper quality or novelty**.
|
107 |
+
- To identify potentially impactful papers, this study uses the sigmoid+MSE approach to optimize NDCG values (over sigmoid+BCE), resulting in predicted values generally concentrated **between 0.1 and 0.9**.
|
108 |
+
- Empirically, it is considered a predicted influence score greater than **0.65** to indicate an impactful paper.
|
109 |
+
- The **author takes NO responsibility** for the prediction results.
|
110 |
""")
|
111 |
|
112 |
+
|
113 |
title_input.change(
|
114 |
update_button_status,
|
115 |
inputs=[title_input, abstract_input],
|