ssocean commited on
Commit
51eb0c1
·
verified ·
1 Parent(s): 012edbb

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +26 -58
app.py CHANGED
@@ -30,61 +30,27 @@ def predict(title, abstract):
30
  with torch.no_grad():
31
  outputs = model(**inputs)
32
  probability = torch.sigmoid(outputs.logits).item()
33
- # reason for +0.05: We observed that the predicted values in the web demo are generally around 0.05 lower than those in the local deployment (due to differences in software/hardware environments). Therefore, we applied the following compensation in the web demo. Please do not use this in the local deployment.
34
  if probability + 0.05 >=1.0:
35
  return round(1, 4)
36
  return round(probability + 0.05, 4)
37
 
38
 
39
- # 示例数据
40
  examples = [
41
  [
42
- "LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration",
43
- ('''Prompt-based all-in-one image restoration (IR) frameworks have achieved
44
- remarkable performance by incorporating degradation-specific information into
45
- prompt modules. Nevertheless, handling the complex and diverse degradations
46
- encountered in real-world scenarios remains a significant challenge. To address
47
- this challenge, we propose LoRA-IR, a flexible framework that dynamically
48
- leverages compact low-rank experts to facilitate efficient all-in-one image
49
- restoration. Specifically, LoRA-IR consists of two training stages:
50
- degradation-guided pre-training and parameter-efficient fine-tuning. In the
51
- pre-training stage, we enhance the pre-trained CLIP model by introducing a
52
- simple mechanism that scales it to higher resolutions, allowing us to extract
53
- robust degradation representations that adaptively guide the IR network. In the
54
- fine-tuning stage, we refine the pre-trained IR network using low-rank
55
- adaptation (LoRA). Built upon a Mixture-of-Experts (MoE) architecture, LoRA-IR
56
- dynamically integrates multiple low-rank restoration experts through a
57
- degradation-guided router. This dynamic integration mechanism significantly
58
- enhances our model's adaptability to diverse and unknown degradations in
59
- complex real-world scenarios. Extensive experiments demonstrate that LoRA-IR
60
- achieves state-of-the-art performance across 14 image restoration tasks and 29
61
- benchmarks. Code and pre-trained models will be available at:
62
- https://github.com/shallowdream204/LoRA-IR.''')
63
  ],
64
  [
65
- "ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance",
66
- ('''Diffusion models have shown impressive potential on talking head generation.
67
- While plausible appearance and talking effect are achieved, these methods still
68
- suffer from temporal, 3D or expression inconsistency due to the error
69
- accumulation and inherent limitation of single-image generation ability. In
70
- this paper, we propose ConsistentAvatar, a novel framework for fully consistent
71
- and high-fidelity talking avatar generation. Instead of directly employing
72
- multi-modal conditions to the diffusion process, our method learns to first
73
- model the temporal representation for stability between adjacent frames.
74
- Specifically, we propose a Temporally-Sensitive Detail (TSD) map containing
75
- high-frequency feature and contours that vary significantly along the time
76
- axis. Using a temporal consistent diffusion module, we learn to align TSD of
77
- the initial result to that of the video frame ground truth. The final avatar is
78
- generated by a fully consistent diffusion module, conditioned on the aligned
79
- TSD, rough head normal, and emotion prompt embedding. We find that the aligned
80
- TSD, which represents the temporal patterns, constrains the diffusion process
81
- to generate temporally stable talking head. Further, its reliable guidance
82
- complements the inaccuracy of other conditions, suppressing the accumulated
83
- error while improving the consistency on various aspects. Extensive experiments
84
- demonstrate that ConsistentAvatar outperforms the state-of-the-art methods on
85
- the generated appearance, 3D, expression and temporal consistency. Project
86
- page: https://njust-yang.github.io/ConsistentAvatar.github.io/''')
87
  ]
 
88
  ]
89
 
90
  def validate_input(title, abstract):
@@ -111,21 +77,21 @@ def update_button_status(title, abstract):
111
 
112
  with gr.Blocks() as iface:
113
  gr.Markdown("""
114
- # 🧠 Predict Academic Impact of Newly Published Paper!
115
- ### Estimate the future academic impact of a paper using LLM
116
- ###### [Read the full paper](https://arxiv.org/abs/2408.03934)
117
- ###### Please note that due to the characteristics of ZeroGPU, quantized models cannot be preloaded. Each time you click "Predict," the model will need to be reinitialized, which may take additional time (usually less than 20s).
118
  """)
119
  with gr.Row():
120
  with gr.Column():
121
  title_input = gr.Textbox(
122
  lines=2,
123
- placeholder="Enter Paper Title Here...",
124
  label="Paper Title"
125
  )
126
  abstract_input = gr.Textbox(
127
  lines=5,
128
- placeholder="Enter Paper Abstract Here... (Do not input line breaks. No more than 1024 tokens.)",
129
  label="Paper Abstract"
130
  )
131
  validation_status = gr.Textbox(label="Validation Status", interactive=False)
@@ -133,15 +99,17 @@ with gr.Blocks() as iface:
133
  with gr.Column():
134
  output = gr.Label(label="Predicted Impact")
135
  gr.Markdown("""
136
- **Important Notes**
137
- - It is intended as a tool for research and educational purposes only.
138
- - Predicted impact is a probabilistic value generated by the model and does not reflect paper quality or novelty.
139
- - The author takes no responsibility for the prediction results.
140
- - To identify potentially impactful papers, this study uses the sigmoid+MSE approach to optimize NDCG values (over sigmoid+BCE), resulting in predicted values concentrated between 0.1 and 0.9 due to the sigmoid gradient effect.
141
- - Generally, it is considered a predicted influence score greater than 0.65 to indicate an impactful paper.
 
 
142
  """)
143
 
144
- # 输入事件绑定
145
  title_input.change(
146
  update_button_status,
147
  inputs=[title_input, abstract_input],
 
30
  with torch.no_grad():
31
  outputs = model(**inputs)
32
  probability = torch.sigmoid(outputs.logits).item()
33
+ # reason for +0.05: We observed that the predicted values in the web demo are generally around 0.05 lower than those in the local deployment (due to differences in software/hardware environments, we believed). Therefore, we applied the following compensation in the web demo. Please do not use this in the local deployment.
34
  if probability + 0.05 >=1.0:
35
  return round(1, 4)
36
  return round(probability + 0.05, 4)
37
 
38
 
39
+
40
  examples = [
41
  [
42
+ "Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection",
43
+ ('''One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at this https URL.''')
44
+ ],
45
+ [
46
+ "OminiControl: Minimal and Universal Control for Diffusion Transformer",
47
+ ('''In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone and process them with its flexible multi-modal attention processors. Unlike existing methods, which rely heavily on additional encoder modules with complex architectures, OminiControl (1) effectively and efficiently incorporates injected image conditions with only ~0.1% additional parameters, and (2) addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions such as edges, depth, and more. Remarkably, these capabilities are achieved by training on images generated by the DiT itself, which is particularly beneficial for subject-driven generation. Extensive evaluations demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted models in both subject-driven and spatially-aligned conditional generation. Additionally, we release our training dataset, Subjects200K, a diverse collection of over 200,000 identity-consistent images, along with an efficient data synthesis pipeline to advance research in subject-consistent generation.''')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ],
49
  [
50
+ "Enhanced ZSSR for Super-resolution Reconstruction of the Historical Tibetan Document Images",
51
+ "Due to the poor preservation and imaging conditions, the image quality of historical Tibetan document images is relatively unsatisfactory. In this paper, we adopt super-resolution technology to reconstruct high quality images of historical Tibetan document. To address the problem of low quantity and poor quality of historical Tibetan document images, we propose the EZSSR network based on the Zero-Shot Super-resolution Network (ZSSR), which borrows the idea of feature pyramid in Deep Laplacian Pyramid Networks (LapSRN) to extract different levels of features while alleviating the ringing artifacts. EZSSR neither requires paired training datasets nor preprocessing stage. The computational complexity of EZSSR is low, and thus, EZSSR can also reconstruct image within the acceptable time frame. Experimental results show that EZSSR reconstructs images with better visual effects and higher PSNR and SSIM values."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ]
53
+
54
  ]
55
 
56
  def validate_input(title, abstract):
 
77
 
78
  with gr.Blocks() as iface:
79
  gr.Markdown("""
80
+ # 📈 Predict Academic Impact of Newly Published Paper!
81
+ ### Estimate the future academic impact from the title and abstract with LLM.
82
+ ###### [Full Paper](https://arxiv.org/abs/2408.03934)
83
+ ###### Kindly note that ZeroGPU does not support preloading quantized models for now. Each time you click "Predict," the model will be reinitialized, which could take around 20 seconds.
84
  """)
85
  with gr.Row():
86
  with gr.Column():
87
  title_input = gr.Textbox(
88
  lines=2,
89
+ placeholder="Enter Paper Title Here... (Title will be processed with 'title.replace("\n", " ").strip()')",
90
  label="Paper Title"
91
  )
92
  abstract_input = gr.Textbox(
93
  lines=5,
94
+ placeholder='''Enter Paper Abstract Here... (Abstract will be processed with 'abstract.replace("\n", " ").strip()')''',
95
  label="Paper Abstract"
96
  )
97
  validation_status = gr.Textbox(label="Validation Status", interactive=False)
 
99
  with gr.Column():
100
  output = gr.Label(label="Predicted Impact")
101
  gr.Markdown("""
102
+ ## Ethical Warnings and Important Notes
103
+ - It is intended as a tool **for research and educational purposes only**.
104
+ - Please refrain from deliberately embellishing the title and abstract to boost scores, and avoid making false claims.
105
+ - Our training data only includes samples from the fields including cs.CV, cs.CL (NLP), and cs.AI. Predictions outside these areas are not recommended for reference.
106
+ - The **predicted value** is a probability generated by the model and **does NOT reflect paper quality or novelty**.
107
+ - To identify potentially impactful papers, this study uses the sigmoid+MSE approach to optimize NDCG values (over sigmoid+BCE), resulting in predicted values generally concentrated **between 0.1 and 0.9**.
108
+ - Empirically, it is considered a predicted influence score greater than **0.65** to indicate an impactful paper.
109
+ - The **author takes NO responsibility** for the prediction results.
110
  """)
111
 
112
+
113
  title_input.change(
114
  update_button_status,
115
  inputs=[title_input, abstract_input],