Spaces:

lihuigu
/

SciPIP

Running

App Files Files Community

lihuigu commited on Dec 1, 2024

Commit

88253fe

1 Parent(s): e7f10cc

update UI

Browse files

Files changed (17) hide show

README.md +172 -1
assets/pic/demo.png +0 -0
assets/pic/figure_idea_proposal.svg +0 -0
assets/pic/logo.jpg +0 -0
assets/pic/logo.svg +144 -0
assets/pic/sys.png +0 -0
src/ai_scientist_idea.py +1 -3
src/app_pages/button_interface.py +7 -12
src/app_pages/homepage.py +63 -6
src/app_pages/locale.json +24 -16
src/app_pages/locale.py +1 -1
src/app_pages/one_click_generation.py +9 -6
src/app_pages/sidebar_components.py +5 -3
src/app_pages/step_by_step_generation.py +15 -29
src/generator.py +3 -7
src/paper_manager.py +1 -1
src/utils/paper_retriever.py +1 -1

README.md CHANGED Viewed

@@ -10,5 +10,176 @@ pinned: false
 license: mit
 short_description: Quickly generating novel research ideas.
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 short_description: Quickly generating novel research ideas.
 ---
+<center><h1> 💡SciPIP: An LLM-based Scientific Paper Idea Proposer </h1></center>
+<div align="center">
+  <p>
+      <a href="https://github.com/cheerss/SciPIP/issues">
+          <img src="https://img.shields.io/github/issues/cheerss/SciPIP" alt="GitHub issues">
+      </a>
+      <a href="LICENSE">
+          <img src="https://img.shields.io/github/license/cheerss/SciPIP" alt="License">
+      </a>
+      <a href="https://arxiv.org/abs/2410.23166">
+          <img src="https://img.shields.io/badge/arXiv-2410.23166-b31b1b" alt="arXiv">
+      </a>
+      <img src="https://img.shields.io/github/stars/cheerss/SciPIP?color=green&style=social" alt="GitHub stars">
+      <img src="https://img.shields.io/badge/python->=3.10.3-blue" alt="Python version">
+  </p>
+</div>
+![SciPIP](./assets/pic/logo.jpg)
+## Introduction
+SciPIP is a scientific paper idea generation tool powered by a large language model (LLM) designed to **assist researchers in quickly generating novel research ideas**. Based on the background information provided by the user, SciPIP first conducts a literature review to identify relevant research, then generates fresh ideas for potential studies.
+![SciPIP](./assets/pic/demo.png)
+🤗 Try it on the Hugging Face (Coming Soon... You can deploy it at your own computer now.)
+## Updates
+- [x] Idea generation in a GUI enviroment (web app).
+- [x] Idea generation for the NLP and multimodal (partial) field.
+- [ ] Idea generation for the CV field.
+- [ ] Idea generation for other fields.
+- [ ] Release the Huggingface demo.
+## Prerequisites
+The following enviroments are tested under Ubuntu 22.04 with python>=3.10.3.
+1. **Install essential packages**, feel free to copy and paste the following commans into your terminal. After that, you can visit your Neo4j databse in a browser.
+   ```bash
+   ## Install git-lfs
+   curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
+   sudo apt install git-lfs
+   ## Create new conda environment scipip
+   conda env create -f environment.yml
+   conda activate scipip
+   ## Install Neo4j database
+   sudo apt install -y openjdk-17-jre # Install Neo4j required JDK
+   # cd ~/Downloads # or /your/path/to/download/Neo4j
+   wget http://dist.neo4j.org/neo4j-community-5.20.0-unix.tar.gz
+   tar -xvf neo4j-community-5.20.0-unix.tar.gz
+   ## Start Neo4j
+   cd ./neo4j-community-5.20.0
+   # Uncomment server.default_listen_address=0.0.0.0 in conf/neo4j.conf to visit Neo4j through a browser
+   sed -i 's/# server.default_listen_address=0.0.0.0/server.default_listen_address=0.0.0.0/g' ./conf/neo4j.conf
+   ./bin/neo4j start
+   # Default URL for neo4j is "http://127.0.0.1:7474"
+   # Default URI for ner4j is "bolt://127.0.0.1:7687"
+   # Default username and password for neo4j database are both "neo4j"
+   # !![IMPORTANT] You must visit "http://127.0.0.1:7474" and change the default password before next step. It is because Neo4j does not permit running with a default password.
+   ```
+2. **Clone this repository (SciPIP) and edit the configuration files.** Specifically, LLMs' API token and the Neo4j' username/password are need configuring, and we have provided the template.
+   ```bash
+   ## Clone our repository
+   git clone [email protected]:cheerss/SciPIP.git && cd SciPIP
+   ## Edit scripts/env.sh
+   # Must be corrected: NEO4J_USERNAME / NEO4J_PASSWD / MODEL_API_KEY / MODEL_URL
+   # Others are optional
+   ## source env
+   source scripts/env.sh
+   ```
+3. **Prepare the literature database**
+   1. Download the literature data from [this link](https://drive.google.com/file/d/1NZTDpxKo7bmxwXPI03dgikEemKGLkwne/view?usp=sharing) and save it to `assets/data/scipip_neo4j_clean_backup.json`.
+   2. Then, run the following command to load the literature into Neo4j database (It may 40-60 minutes):
+   ```
+   python src/utils/paper_client.py
+   ```
+4. **[Optional] Prepare the embedding model**. Our algorithm uses SentenceBERT and **will automatically download** it from Huggingface the first time the program is run. However, if you're concerned about potential download failures due to network issues, you can download it in advance and place it in the specified directory.
+   ```bash
+   cd /root/path/of/SciPIP && mkdir -p assets/model/sentence-transformers
+   git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 assets/model/sentence-transformers/all-MiniLM-L6-v2 assets/model/sentence-transformers
+   ```
+## Run In a Browser (Recommended)
+```bash
+streamlit run app.py
+# OR
+python -m streamlit run app.py
+```
+Then, visit `http://localhost:8501` in your browser with an interactive enviroment.
+## Run In a Terminal
+**1. BackTracking of ACL 2024**
+```
+python src/generator.py backtracking --brainstorm-mode mode_c --use-cue-words True --use-inspiration True --num 1
+```
+Results dump in `assets/output_idea/output_backtracking_mode_c_cue_True_ins_True.json`.
+**2. Generate new idea**
+Input your backgound and cue words in `assets/data/test_background.json`
+```
+python src/generator.py new-idea --brainstorm-mode mode_c --use-inspiration True --num 2
+```
+Results dump in `assets/output_idea/output_new_idea_mode_c_ins_True.json`.
+## Others
+### Retrieve Eval
+Generate retrieve eval log result in `./log`.
+```
+bash scripts/retriever_eval.sh
+```
+### Database Construction
+SciPIP uses Neo4j as its database. You can directly import the provided data or add your own research papers.
+```
+wget https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
+pip install en_core_web_sm-3.7.1-py3-none-any.whl
+```
+The directory for storing papers can be modified in the `pdf_cached` field of `configs/datasets.yaml`.
+**1. Generate json list**
+```
+python src/paper_manager.py crawling --year all --venue-name nips
+```
+json files are saved at `./assets/paper/<$venue-name>/<$year>`
+**2. Fetch Papers**
+```
+python src/paper_manager.py update --year all --venue-name nips
+```
+## Cite Us
+```
+@article{wang2024scipip,
+  title={SciPIP: An LLM-based Scientific Paper Idea Proposer},
+  author={Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye},
+  journal={arXiv preprint arXiv:2410.23166},
+  url={https://arxiv.org/abs/2410.23166},
+  year={2024}
+}
+```
+## Help Us To Improve
+https://forms.gle/YpLUrhqs1ahyCAe99
+Thank you for your use! We hope SciPIP can help you generate research ideas! 🎉

assets/pic/demo.png ADDED Viewed

assets/pic/figure_idea_proposal.svg ADDED Viewed

assets/pic/logo.jpg ADDED Viewed

assets/pic/logo.svg ADDED Viewed

assets/pic/sys.png ADDED Viewed

src/ai_scientist_idea.py CHANGED Viewed

@@ -89,9 +89,7 @@ def generate(config_path, ids_path, retriever_name, **kwargs):
         logger.debug("Original entities from background: {}".format(entities))
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
-            config,
-            use_cocite=config.RETRIEVE.use_cocite,
-            use_cluster_to_filter=config.RETRIEVE.use_cluster_to_filter
         )
         result = rt.retrieve(bg, entities, need_evaluate=False, target_paper_id_list=[], top_k=5)
         related_paper = result["related_paper"]

         logger.debug("Original entities from background: {}".format(entities))
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
+            config
         )
         result = rt.retrieve(bg, entities, need_evaluate=False, target_paper_id_list=[], top_k=5)
         related_paper = result["related_paper"]

src/app_pages/button_interface.py CHANGED Viewed

@@ -66,21 +66,16 @@ class Backend(object):
     def entities2literature_callback(self, background, entities, json_strs=None):
         if json_strs is not None:
-            json_contents = json.loads(json_strs)
-            res = ""
-            for i, p in enumerate(json_contents["related_paper"]):
-                res += "%d. " % (i + 1) + str(p)
-                if i < len(json_contents["related_paper"]) - 1:
-                    res += "\n"
-            return res, res
         else:
             result = self.retriever_factory.retrieve(background, entities, need_evaluate=False, target_paper_id_list=[])
-            res = ""
             for i, p in enumerate(result["related_paper"]):
-                res += "%d. " % (i + 1) + str(p["title"])
-                if i < len(result["related_paper"]) - 1:
-                    res += "\n"
-            return res, result["related_paper"]
     def literature2initial_ideas_callback(self, background, brainstorms, retrieved_literature, json_strs=None):
         if json_strs is not None:

     def entities2literature_callback(self, background, entities, json_strs=None):
         if json_strs is not None:
+            result = json.loads(json_strs)
+            res = []
+            for i, p in enumerate(result["related_paper"]):
+                res.append(str(p))
         else:
             result = self.retriever_factory.retrieve(background, entities, need_evaluate=False, target_paper_id_list=[])
+            res = []
             for i, p in enumerate(result["related_paper"]):
+                res.append(f'{p["title"]}. {p["venue_name"].upper()} {p["year"]}.')
+        return res, result["related_paper"]
     def literature2initial_ideas_callback(self, background, brainstorms, retrieved_literature, json_strs=None):
         if json_strs is not None:

src/app_pages/homepage.py CHANGED Viewed

@@ -4,19 +4,76 @@ from .locale import _
 from .sidebar_components import get_sidebar_header, get_sidebar_supported_fields, get_help_us_improve, get_language_select
 def generate_sidebar():
     get_sidebar_header()
-    st.sidebar.markdown("Make AI research easy")
     get_sidebar_supported_fields()
     get_help_us_improve()
-    get_language_select()
 def generate_mainpage():
-    st.title("🏠️ 💡SciPIP: An LLM-based Scientific Paper Idea Proposer")
-    # st.image("./assets/pic/logo.pdf")
-    st.header("Introduction")
-    st.markdown("SciPIP is a scientific paper idea generation tool powered by a large language model (LLM) designed to **assist researchers in quickly generating novel research ideas**. Based on the background information provided by the user, SciPIP first conducts a literature review to identify relevant research, then generates fresh ideas for potential studies.")
 def home_page():
     generate_sidebar()

 from .sidebar_components import get_sidebar_header, get_sidebar_supported_fields, get_help_us_improve, get_language_select
 def generate_sidebar():
+    get_language_select()
     get_sidebar_header()
+    st.sidebar.markdown(_("Make AI research easy"))
     get_sidebar_supported_fields()
     get_help_us_improve()
 def generate_mainpage():
+    if st.session_state.get("language", "en") == "en":
+        st.title("🏠️ 💡SciPIP: An LLM-based Scientific Paper Idea Proposer")
+        _, logo_col, _ = st.columns(3)
+        logo_col.image("./assets/pic/logo.svg", width=None)
+        st.header("Introduction", divider="blue")
+        st.markdown("SciPIP is a scientific paper idea generation tool powered by a large language model (LLM) designed to **assist researchers in quickly generating novel research ideas**. Based on the background information provided by the user, SciPIP first conducts a literature review to identify relevant research, then generates fresh ideas for potential studies.")
+        st.header("Pipeline", divider="blue")
+        _, idea_proposal_col, _ = st.columns([1, 5, 1])
+        idea_proposal_col.image("./assets/pic/figure_idea_proposal.svg", width=None)
+        st.markdown("""This demo uses SciPIP-C, as described in the [paper](https://arxiv.org/abs/2410.23166), as the default idea generation method. The generation process is mainly divided into six steps:
+    1. **Input Background**: The user inputs the background of the research.
+    2. **Brainstorming**: The large model, without retrieving any literature, generates solutions to the problems in the user-inputted background based solely on its own knowledge.
+    3. **Extracting Entities**: Extract keywords from the user’s input background and the content generated during brainstorming.
+    4. **Retrieving Related Works**: Search for relevant literature in the database based on the extracted keywords and the user’s input background.
+    5. **Generating Initial Ideas**: Draw inspiration from the retrieved literature and, combined with the brainstorming content, propose initial ideas.
+    6. **Generating Final Ideas**: Filter, refine, and process the initial ideas to produce the final ideas.
+    """)
+        st.header("One-click Generation vs. Step-by-step Generation", divider="blue")
+        # st.markdown("一键生成与逐步生成均使用相同的算法（SciPIP-C），对于一键生成而言，用户无需关心所有的中间输出，可以直接得到最终的Ideas。而逐步生成会按照Pipeline的步骤逐步生成，每步生成结束后，用户都可以修订此步骤生成的内容，从而影响后续生成结果。")
+        st.markdown("Both one-click generation and step-by-step generation use the same algorithm (SciPIP-C). For one-click generation, the user does not need to concern themselves with the intermediate outputs and can directly obtain the final ideas. In contrast, step-by-step generation follows the pipeline process, where the content is generated step by step. After each step, the user can revise the content generated in that step, which will influence the results of subsequent steps.")
+        st.header("Resources")
+        st.markdown("Our paper: [https://arxiv.org/abs/2410.23166](https://arxiv.org/abs/2410.23166)")
+        st.markdown("Our github repository: [https://github.com/cheerss/SciPIP](https://github.com/cheerss/SciPIP)")
+        st.markdown("Our Huggingface demo: Coming soon...")
+        # st.page_link("https://arxiv.org/abs/2410.23166", label="Our paper: https://arxiv.org/abs/2410.23166", icon=None)
+        # st.page_link("https://github.com/cheerss/SciPIP", label="Our github repository: https://github.com/cheerss/SciPIP", icon=None)
+    else:
+        st.title("🏠️ 💡SciPIP: 基于大语言模型的科学论文创意生成器")
+        _, logo_col, _ = st.columns(3)
+        logo_col.image("./assets/pic/logo.svg", width=None)
+        st.header("简介", divider="blue")
+        st.markdown("SciPIP 是一个由大语言模型（LLM）驱动的科学论文创意生成工具，旨在**帮助研究人员快速生成新颖的研究思路**。基于用户提供的背景信息，SciPIP首先进行文献回顾以识别相关研究，然后为潜在的研究方向生成新的创意。")
+        st.header("流程", divider="blue")
+        _, idea_proposal_col, _ = st.columns([1, 5, 1])
+        idea_proposal_col.image("./assets/pic/figure_idea_proposal.svg", width=None)
+        st.markdown("""本演示采用论文中所述的SciPIP-C作为默认的创意生成方法，生成流程主要分为六个步骤：
+1. **输入背景**：用户输入研究的背景信息。
+2. **头脑风暴**：大模型在不检索任何文献的情况下，仅凭自身知识为用户输入的背景中的问题生成解决方案。
+3. **提取实体**：从用户输入的背景和头脑风暴生成的内容中提取关键词。
+4. **检索相关文献**：根据提取的关键词和用户输入的背景信息，在数据库中检索相关文献。
+5. **生成初始创意**：从检索到的文献中汲取灵感，并结合头脑风暴的内容提出初步创意。
+6. **生成最终创意**：对初始创意进行筛选、精炼和加工，最终生成创意。
+    """)
+        st.header("一键生成 与 逐步生成", divider="blue")
+        st.markdown("一键生成与逐步生成均使用相同的算法（SciPIP-C），对于一键生成而言，用户无需关心所有的中间输出，可以直接得到最终的Ideas。而逐步生成会按照Pipeline的步骤逐步生成，每步生成结束后，用户都可以修订此步骤生成的内容，从而影响后续生成结果。")
+        st.header("相关资源")
+        st.markdown("论文: [https://arxiv.org/abs/2410.23166](https://arxiv.org/abs/2410.23166)")
+        st.markdown("Github仓库: [https://github.com/cheerss/SciPIP](https://github.com/cheerss/SciPIP)")
+        st.markdown("Huggingface演示: 敬请期待...")
+        # st.page_link("https://arxiv.org/abs/2410.23166", label="Our paper: https://arxiv.org/abs/2410.23166", icon=None)
+        # st.page_link("https://github.com/cheerss/SciPIP", label="Our github repository: https://github.com/cheerss/SciPIP", icon=None)
 def home_page():
     generate_sidebar()

src/app_pages/locale.json CHANGED Viewed

@@ -1,7 +1,11 @@
 {
     "SciPIP will generate ideas in one click. The generation pipeline is the same as step-by-step generation, but you are free from caring about intermediate outputs.": {
         "en": "-",
-        "zh": "SciPIP将一键生成Ideas，用户无需关心中间输出，Ideas生成使用的算法与逐步生成相同。"
     },
     "1. Input Background": {
         "en": "-",
@@ -19,13 +23,13 @@
         "en": "-",
         "zh": "检索相关工作"
     },
-    "5. Generate Initial Ideas": {
         "en": "-",
-        "zh": "生成初始Ideas"
     },
-    "6. Generate Final Ideas": {
         "en": "-",
-        "zh": "生成最终Ideas"
     },
     "Pipeline": {
         "en": "-",
@@ -33,11 +37,11 @@
     },
     "Supported Fields": {
         "en": "-",
-        "zh": "支持领域"
     },
     "The supported fields are temporarily limited because we only collect literature from ICML, ICLR, NeurIPS, ACL, and EMNLP. Support for other fields are in progress.": {
         "en": "-",
-        "zh": "由于当前我们构建的文献库中仅包含过去10年来自ICML、ICLR、NeurIPS、ACL和EMNLP的论文，因此Ideas生成支持的领域暂时有限"
     },
     "Natural Language Processing (NLP)": {
         "en": "-",
@@ -61,7 +65,7 @@
     },
     "💧 One-click Generation": {
         "en": "-",
-        "zh": "💧 一键生成Idea"
     },
     "Check Brainstorms": {
         "en": "-",
@@ -82,11 +86,11 @@
     "SciPIP will generate ideas step by step. The generation pipeline is the same as one-click generation, while you can improve each part manually after SciPIP providing the manuscript.": {
         "en": "-",
-        "zh": "SciPIP将会逐步生成Ideas，生成使用的算法与一键生成相同，但是用户可以在SciPIP给出中间过程的初稿后修改其中内容。"
     },
     "💦 Step-by-step Generation": {
         "en": "-",
-        "zh": "💦 逐步生成Idea"
     },
     "🐳 Background": {
         "en": "-",
@@ -96,6 +100,10 @@
         "en": "-",
         "zh": "提交"
     },
     "👻 Brainstorms": {
         "en": "-",
         "zh": "👻 头脑风暴"
@@ -110,11 +118,11 @@
     },
     "😼 Generated Initial Ideas": {
         "en": "-",
-        "zh": "😼 生成初始Ideas"
     },
     "😸 Generated Final Ideas": {
         "en": "-",
-        "zh": "😸 生成最终Ideas"
     },
     "Brainstorming...": {
         "en": "-",
@@ -130,11 +138,11 @@
     },
     "Generating initial ideas...": {
         "en": "-",
-        "zh": "生成初步Ideas……"
     },
     "Generating final ideas...": {
         "en": "-",
-        "zh": "生成最终Ideas……"
     },
     "Please input the brainstorms on the left.": {
         "en": "-",
@@ -150,11 +158,11 @@
     },
     "Please input the initial ideas on the left.": {
         "en": "-",
-        "zh": "请在左侧修改初始Ideas"
     },
     "Please input the final ideas on the left.": {
         "en": "-",
-        "zh": "请在左侧修改最终Ideas"
     },
     "🏠️ Homepage": {
         "en": "-",

 {
+    "Make AI research easy": {
+        "en": "-",
+        "zh": "让AI研究变得简单"
+    },
     "SciPIP will generate ideas in one click. The generation pipeline is the same as step-by-step generation, but you are free from caring about intermediate outputs.": {
         "en": "-",
+        "zh": "SciPIP将一键生成创意，用户无需关心中间输出，创意生成使用的算法与逐步生成相同。"
     },
     "1. Input Background": {
         "en": "-",
         "en": "-",
         "zh": "检索相关工作"
     },
+    "5. Generating Initial Ideas": {
         "en": "-",
+        "zh": "生成初始创意"
     },
+    "6. Generating Final Ideas": {
         "en": "-",
+        "zh": "生成最终创意"
     },
     "Pipeline": {
         "en": "-",
     },
     "Supported Fields": {
         "en": "-",
+        "zh": "支持的领域"
     },
     "The supported fields are temporarily limited because we only collect literature from ICML, ICLR, NeurIPS, ACL, and EMNLP. Support for other fields are in progress.": {
         "en": "-",
+        "zh": "由于当前我们构建的文献库中仅包含过去10年来自ICML、ICLR、NeurIPS、ACL和EMNLP的论文，因此创意生成支持的领域暂时有限"
     },
     "Natural Language Processing (NLP)": {
         "en": "-",
     },
     "💧 One-click Generation": {
         "en": "-",
+        "zh": "💧 一键生成创意"
     },
     "Check Brainstorms": {
         "en": "-",
     "SciPIP will generate ideas step by step. The generation pipeline is the same as one-click generation, while you can improve each part manually after SciPIP providing the manuscript.": {
         "en": "-",
+        "zh": "SciPIP将会逐步生成创意，生成使用的算法与一键生成相同，但是用户可以在SciPIP给出中间过程的初稿后修改其中内容。"
     },
     "💦 Step-by-step Generation": {
         "en": "-",
+        "zh": "💦 逐步生成创意"
     },
     "🐳 Background": {
         "en": "-",
         "en": "-",
         "zh": "提交"
     },
+    "Example": {
+        "en": "-",
+        "zh": "例"
+    },
     "👻 Brainstorms": {
         "en": "-",
         "zh": "👻 头脑风暴"
     },
     "😼 Generated Initial Ideas": {
         "en": "-",
+        "zh": "😼 生成初始创意"
     },
     "😸 Generated Final Ideas": {
         "en": "-",
+        "zh": "😸 生成最终创意"
     },
     "Brainstorming...": {
         "en": "-",
     },
     "Generating initial ideas...": {
         "en": "-",
+        "zh": "生成初步创意……"
     },
     "Generating final ideas...": {
         "en": "-",
+        "zh": "生成最终创意……"
     },
     "Please input the brainstorms on the left.": {
         "en": "-",
     },
     "Please input the initial ideas on the left.": {
         "en": "-",
+        "zh": "请在左侧修改初始创意"
     },
     "Please input the final ideas on the left.": {
         "en": "-",
+        "zh": "请在左侧修改最终创意"
     },
     "🏠️ Homepage": {
         "en": "-",

src/app_pages/locale.py CHANGED Viewed

@@ -4,7 +4,7 @@ import streamlit as st
 json_contents = json.loads(open("./src/app_pages/locale.json", "r").read())
 def _(content: str):
-    if st.session_state["language"] == "en":
         return content
     a = json_contents.get(content, content)
     if isinstance(a, dict):

 json_contents = json.loads(open("./src/app_pages/locale.json", "r").read())
 def _(content: str):
+    if st.session_state.get("language", "en") == "en":
         return content
     a = json_contents.get(content, content)
     if isinstance(a, dict):

src/app_pages/one_click_generation.py CHANGED Viewed

@@ -20,6 +20,7 @@ if "global_state_one_click" not in st.session_state:
     st.session_state["global_state_one_click"] = 1.0
 def generate_sidebar():
     get_sidebar_header()
     st.sidebar.markdown(
         _("SciPIP will generate ideas in one click. The generation pipeline is the same as "
@@ -27,14 +28,13 @@ def generate_sidebar():
     )
     pipeline_list = [_("1. Input Background"), _("2. Brainstorming"), _("3. Extracting Entities"), _("4. Retrieving Related Works"),
-                     _("5. Generate Initial Ideas"), _("6. Generate Final Ideas")]
     st.sidebar.header(_("Pipeline"), divider="red")
     for i in range(6):
         st.sidebar.markdown(f"<font color='black'>{pipeline_list[i]}</font>", unsafe_allow_html=True)
     get_sidebar_supported_fields()
     get_help_us_improve()
-    get_language_select()
 def generate_mainpage(backend):
     st.title(_("💧 One-click Generation"))
@@ -67,10 +67,12 @@ def generate_mainpage(backend):
         st.session_state["demo_input"] = demo_input
     cols = st.columns([1, 1, 1, 1])
-    cols[0].button(_("Example 1"), on_click=get_demo_n, args=(0,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
-    cols[1].button(_("Example 2"), on_click=get_demo_n, args=(1,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
-    cols[2].button(_("Example 3"), on_click=get_demo_n, args=(2,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
-    cols[3].button(_("Example 4"), on_click=get_demo_n, args=(3,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
     def check_intermediate_outputs(id="brainstorms"):
         msg = st.session_state["intermediate_output"].get(id, None)
@@ -81,6 +83,7 @@ def generate_mainpage(backend):
     def reset():
         del(st.session_state["messages"])
         st.session_state["enable_submmit"] = True
         st.session_state["global_state_one_click"] = 1.0
         st.toast(f"The chat has been reset!")

     st.session_state["global_state_one_click"] = 1.0
 def generate_sidebar():
+    get_language_select()
     get_sidebar_header()
     st.sidebar.markdown(
         _("SciPIP will generate ideas in one click. The generation pipeline is the same as "
     )
     pipeline_list = [_("1. Input Background"), _("2. Brainstorming"), _("3. Extracting Entities"), _("4. Retrieving Related Works"),
+                     _("5. Generating Initial Ideas"), _("6. Generating Final Ideas")]
     st.sidebar.header(_("Pipeline"), divider="red")
     for i in range(6):
         st.sidebar.markdown(f"<font color='black'>{pipeline_list[i]}</font>", unsafe_allow_html=True)
     get_sidebar_supported_fields()
     get_help_us_improve()
 def generate_mainpage(backend):
     st.title(_("💧 One-click Generation"))
         st.session_state["demo_input"] = demo_input
     cols = st.columns([1, 1, 1, 1])
+    for i in range(4):
+        cols[i].button(_("Example") + f" {i+1}", on_click=get_demo_n, args=(i,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
+    # cols[0].button(_("Example 1"), on_click=get_demo_n, args=(0,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
+    # cols[1].button(_("Example 2"), on_click=get_demo_n, args=(1,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
+    # cols[2].button(_("Example 3"), on_click=get_demo_n, args=(2,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
+    # cols[3].button(_("Example 4"), on_click=get_demo_n, args=(3,), use_container_width=True, disabled=not st.session_state.get("enable_submmit", True))
     def check_intermediate_outputs(id="brainstorms"):
         msg = st.session_state["intermediate_output"].get(id, None)
     def reset():
         del(st.session_state["messages"])
+        del(st.session_state["intermediate_output"])
         st.session_state["enable_submmit"] = True
         st.session_state["global_state_one_click"] = 1.0
         st.toast(f"The chat has been reset!")

src/app_pages/sidebar_components.py CHANGED Viewed

@@ -21,15 +21,17 @@ def get_help_us_improve():
     st.sidebar.markdown("https://forms.gle/YpLUrhqs1ahyCAe99", unsafe_allow_html=True)
 def get_language_select():
-    st.sidebar.header("语言 / Language", divider="blue")
-    language_option = st.sidebar.selectbox(
         "选择语言 / Select Language",
         options=["中文", "English"],
     )
     if language_option == "中文":
         language = "zh"
     elif language_option == "English":
         language = "en"
-    if language != st.session_state["language"]:
         st.session_state["language"] = language
         st.rerun()

     st.sidebar.markdown("https://forms.gle/YpLUrhqs1ahyCAe99", unsafe_allow_html=True)
 def get_language_select():
+    language = st.session_state.get("language", "en")
+    language_option = st.sidebar.segmented_control(
         "选择语言 / Select Language",
         options=["中文", "English"],
+        selection_mode="single",
+        default=("中文" if language == "zh" else "English")
     )
     if language_option == "中文":
         language = "zh"
     elif language_option == "English":
         language = "en"
+    if language != st.session_state.get("language", "en"):
         st.session_state["language"] = language
         st.rerun()

src/app_pages/step_by_step_generation.py CHANGED Viewed

@@ -4,6 +4,7 @@ from .locale import _
 from .sidebar_components import get_sidebar_header, get_sidebar_supported_fields, get_help_us_improve, get_language_select
 def generate_sidebar():
     get_sidebar_header()
     st.sidebar.markdown(
         _("SciPIP will generate ideas step by step. The generation pipeline is the same as "
@@ -16,7 +17,7 @@ def generate_sidebar():
     INPROGRESS_COLOR = "black"
     color_list = []
     pipeline_list = [_("1. Input Background"), _("2. Brainstorming"), _("3. Extracting Entities"), _("4. Retrieving Related Works"),
-                     _("5. Generate Initial Ideas"), _("6. Generate Final Ideas")]
     for i in range(1, 8):
         if st.session_state["global_state_step"] < i:
             color_list.append(UNDONE_COLOR)
@@ -32,7 +33,6 @@ def generate_sidebar():
     get_sidebar_supported_fields()
     get_help_us_improve()
-    get_language_select()
 def get_textarea_height(text_content):
     if text_content is None:
@@ -44,7 +44,6 @@ def get_textarea_height(text_content):
     return max(count * 23 + 20, 100) # 23 is a magic number
 def generate_mainpage(backend):
-    # print("refresh mainpage")
     st.title(_("💦 Step-by-step Generation"))
     st.header(_("🐳 Background"))
     with st.form('background_form') as bg_form:
@@ -55,7 +54,7 @@ def generate_mainpage(backend):
         def click_demo_i(i):
             st.session_state["background"] = backend.get_demo_i(i)
         for i, col in enumerate(cols):
-            col.form_submit_button(f"Example {i + 1}", use_container_width=True, on_click=click_demo_i, args=(i,))
         col1, col2 = st.columns([2, 20])
         submitted = col1.form_submit_button(_("Submit"), type="primary")
@@ -94,16 +93,6 @@ def generate_mainpage(backend):
     ## Entities
     st.header(_("🐱 Extracted Entities"))
     with st.expander("", expanded=st.session_state.get("entities_expand", False)):
-        ## text area
-        # col1, col2 = st.columns(2, )
-        # entities_old = col1.text_area(label="entities", value=st.session_state.get("entities", "[]"), label_visibility="collapsed")
-        # entities_old = ast.literal_eval(entities_old)
-        # st.session_state["entities"] = entities_old
-        # if entities_old:
-        #     col2.markdown(f"{entities_old}")
-        # else:
-        #     col2.markdown(_("Please input the entities on the left."))
         ## pills
         def update_entities():
             return
@@ -112,36 +101,33 @@ def generate_mainpage(backend):
         entities_updated = st.pills(label="entities", options=ori_entities, selection_mode="multi",
                             default=ori_entities, label_visibility="collapsed", on_change=update_entities)
         st.session_state["entities_updated"] = entities_updated
-        print("=" * 10)
-        print(entities_updated)
-        print(st.session_state["entities_updated"])
-        print("=" * 10)
         submitted = st.button(_("Submit"), key="entities_button", type="primary")
         if submitted:
             st.session_state["global_state_step"] = 4.0
             with st.spinner(text="Retrieving related works..."):
                 st.session_state["related_works"], st.session_state["related_works_intact"] = backend.entities2literature_callback(background, entities_updated)
-            # st.session_state["related_works"] = "related works"
             st.session_state["global_state_step"] = 4.5
             st.session_state["related_works_expand"] = True
     ## Retrieved related works
     st.header(_("📖 Retrieved Related Works"))
     with st.expander("", expanded=st.session_state.get("related_works_expand", False)):
-        col1, col2 = st.columns(2, )
-        widget_height = get_textarea_height(st.session_state.get("related_works", ""))
-        related_works_title = col1.text_area(label="related_works", value=st.session_state.get("related_works", ""),
-                                             label_visibility="collapsed", height=widget_height)
-        if related_works_title:
-            col2.markdown(f"{related_works_title}")
-        else:
-            col2.markdown(_("Please input the related works on the left."))
-        submitted = col1.button(_("Submit"), key="related_works_button", type="primary")
         if submitted:
             st.session_state["global_state_step"] = 5.0
             with st.spinner(text="Generating initial ideas..."):
-                res = backend.literature2initial_ideas_callback(background, brainstorms, st.session_state["related_works_intact"])
                 st.session_state["initial_ideas"] = res[0]
                 st.session_state["final_ideas"] = res[1]
             # st.session_state["initial_ideas"] = "initial ideas"

 from .sidebar_components import get_sidebar_header, get_sidebar_supported_fields, get_help_us_improve, get_language_select
 def generate_sidebar():
+    get_language_select()
     get_sidebar_header()
     st.sidebar.markdown(
         _("SciPIP will generate ideas step by step. The generation pipeline is the same as "
     INPROGRESS_COLOR = "black"
     color_list = []
     pipeline_list = [_("1. Input Background"), _("2. Brainstorming"), _("3. Extracting Entities"), _("4. Retrieving Related Works"),
+                     _("5. Generating Initial Ideas"), _("6. Generating Final Ideas")]
     for i in range(1, 8):
         if st.session_state["global_state_step"] < i:
             color_list.append(UNDONE_COLOR)
     get_sidebar_supported_fields()
     get_help_us_improve()
 def get_textarea_height(text_content):
     if text_content is None:
     return max(count * 23 + 20, 100) # 23 is a magic number
 def generate_mainpage(backend):
     st.title(_("💦 Step-by-step Generation"))
     st.header(_("🐳 Background"))
     with st.form('background_form') as bg_form:
         def click_demo_i(i):
             st.session_state["background"] = backend.get_demo_i(i)
         for i, col in enumerate(cols):
+            col.form_submit_button(_("Example") + f" {i+1}", use_container_width=True, on_click=click_demo_i, args=(i,))
         col1, col2 = st.columns([2, 20])
         submitted = col1.form_submit_button(_("Submit"), type="primary")
     ## Entities
     st.header(_("🐱 Extracted Entities"))
     with st.expander("", expanded=st.session_state.get("entities_expand", False)):
         ## pills
         def update_entities():
             return
         entities_updated = st.pills(label="entities", options=ori_entities, selection_mode="multi",
                             default=ori_entities, label_visibility="collapsed", on_change=update_entities)
         st.session_state["entities_updated"] = entities_updated
         submitted = st.button(_("Submit"), key="entities_button", type="primary")
         if submitted:
             st.session_state["global_state_step"] = 4.0
             with st.spinner(text="Retrieving related works..."):
                 st.session_state["related_works"], st.session_state["related_works_intact"] = backend.entities2literature_callback(background, entities_updated)
+            st.session_state["related_works_use_state"] = [True] * len(st.session_state["related_works"])
             st.session_state["global_state_step"] = 4.5
             st.session_state["related_works_expand"] = True
     ## Retrieved related works
     st.header(_("📖 Retrieved Related Works"))
     with st.expander("", expanded=st.session_state.get("related_works_expand", False)):
+        related_works = st.session_state.get("related_works", [])
+        for i, rw in enumerate(related_works):
+            checked = st.checkbox(rw, value=st.session_state.get("related_works_use_state")[i])
+            st.session_state.get("related_works_use_state")[i] = checked
+        submitted = st.button(_("Submit"), key="related_works_button", type="primary")
         if submitted:
             st.session_state["global_state_step"] = 5.0
             with st.spinner(text="Generating initial ideas..."):
+                st.session_state["selected_related_works_intact"] = []
+                for s, p in zip(st.session_state.get("related_works_use_state"), st.session_state["related_works_intact"]):
+                    if s:
+                        st.session_state["selected_related_works_intact"].append(p)
+                res = backend.literature2initial_ideas_callback(background, brainstorms, st.session_state["selected_related_works_intact"])
                 st.session_state["initial_ideas"] = res[0]
                 st.session_state["final_ideas"] = res[1]
             # st.session_state["initial_ideas"] = "initial ideas"

src/generator.py CHANGED Viewed

@@ -26,7 +26,7 @@ def extract_problem(problem, background):
 class IdeaGenerator:
     def __init__(
-        self, config, paper_list: list[dict], cue_words: list = None, brainstorm: str = None
     ) -> None:
         self.api_helper = APIHelper(config)
         self.paper_list = paper_list
@@ -405,9 +405,7 @@ def backtracking(config_path, ids_path, retriever_name, brainstorm_mode, use_cue
         # 3. 检索相关论文
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
-            config,
-            use_cocite=True,
-            use_cluster_to_filter=True
         )
         result = rt.retrieve(
             bg, entities_all, need_evaluate=False, target_paper_id_list=[]
@@ -577,9 +575,7 @@ def new_idea(config_path, ids_path, retriever_name, brainstorm_mode, use_inspira
         # 2. 检索相关论文
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
-            config,
-            use_cocite=config.RETRIEVE.use_cocite,
-            use_cluster_to_filter=config.RETRIEVE.use_cluster_to_filter,
         )
         result = rt.retrieve(bg, entities_all, need_evaluate=False, target_paper_id_list=[])
         related_paper = result["related_paper"]

 class IdeaGenerator:
     def __init__(
+        self, config, paper_list: list[dict] = [], cue_words: list = None, brainstorm: str = None
     ) -> None:
         self.api_helper = APIHelper(config)
         self.paper_list = paper_list
         # 3. 检索相关论文
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
+            config
         )
         result = rt.retrieve(
             bg, entities_all, need_evaluate=False, target_paper_id_list=[]
         # 2. 检索相关论文
         rt = RetrieverFactory.get_retriever_factory().create_retriever(
             retriever_name,
+            config
         )
         result = rt.retrieve(bg, entities_all, need_evaluate=False, target_paper_id_list=[])
         related_paper = result["related_paper"]

src/paper_manager.py CHANGED Viewed

@@ -163,7 +163,7 @@ class PaperManager:
         self.venue_name = venue_name
         self.year = year
         self.data_type = "train"
-        self.paper_client = PaperClient(config)
         self.paper_crawling = PaperCrawling(config, data_type=self.data_type)
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         self.embedding_model = get_embedding_model(config)

         self.venue_name = venue_name
         self.year = year
         self.data_type = "train"
+        self.paper_client = PaperClient()
         self.paper_crawling = PaperCrawling(config, data_type=self.data_type)
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         self.embedding_model = get_embedding_model(config)

src/utils/paper_retriever.py CHANGED Viewed

@@ -605,7 +605,7 @@ class KGRetriever(Retriever):
         }
         return result
-    def retrieve(self, bg, entities, need_evaluate=True, target_paper_id_list=[]):
         """
         Args:
             context: string

         }
         return result
+    def retrieve(self, bg, entities, need_evaluate=False, target_paper_id_list=[]):
         """
         Args:
             context: string