Spaces:

kiyer
/

pathfinder

Running on CPU Upgrade

App Files Files Community

kiyer commited on Sep 15, 2023

Commit

374da48

1 Parent(s): 237026a

update readme bugfixes

Browse files

Files changed (2) hide show

app.py +18 -2
pages/2_arxiv_embedding.py +0 -5

app.py CHANGED Viewed

@@ -8,19 +8,35 @@ st.set_page_config(
 st.write("# Welcome to arXiv-GPT! 👋")
 st.sidebar.success("Select a function above.")
-st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, or searching for similar papers to an input paper or prompt phrase.")
 st.markdown(
     """
     arXiv+GPT is a framework for searching and visualizing papers on
     the [arXiv](https://arxiv.org/) using the context sensitivity from modern
-    large language models (LLMs) like GPT3 to better link paper contexts
     **👈 Select a tool from the sidebar** to see some examples
     of what this framework can do!
     ### Want to learn more?
     - Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
     - Jump into our [documentation](https://docs.streamlit.io)
     - Contribute!
 """
 )

 st.write("# Welcome to arXiv-GPT! 👋")
 st.sidebar.success("Select a function above.")
+st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.")
 st.markdown(
     """
     arXiv+GPT is a framework for searching and visualizing papers on
     the [arXiv](https://arxiv.org/) using the context sensitivity from modern
+    large language models (LLMs) to better link paper contexts
     **👈 Select a tool from the sidebar** to see some examples
     of what this framework can do!
+    ### Page summary:
+    - `Paper search` looks for relevant papers given an arxiv id or a question.
+    - `Arxiv embedding` shows the landscape of current galaxy evolution papers (astro-ph.GA)
+    - `QA sources` brings it all together to give concise answers to questions with primary sources and relevant papers.
+    ### Coming soon:
+    - [AstroLLaMA](https://huggingface.co/spaces/universeTBD/astrollama) embeddings!
+    - export results
+    - daily updates to repo
+    - other fields apart from `astro-ph.GA`
     ### Want to learn more?
+    - Check out `AstroLLaMA` [paper](https://huggingface.co/papers/2309.06126)
     - Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
     - Jump into our [documentation](https://docs.streamlit.io)
     - Contribute!
+    arXiv+GPT is developed and maintained by [UniverseTBD](https://universetbd.org/). Updates on [huggingface](https://huggingface.co/universeTBD) or [twitter](https://twitter.com/universe_tbd).
 """
 )

pages/2_arxiv_embedding.py CHANGED Viewed

@@ -11,11 +11,6 @@ import pickle
 from scipy import stats
 from urllib.request import urlopen
-st.title("ArXiv+GPT3 embedding explorer")
-st.markdown('[Includes papers up to: `'+dateval+'`]')
-st.markdown("This is an explorer for astro-ph.GA papers on the arXiv (up to Apt 18th, 2023). The papers have been preprocessed with `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/) after which the collected abstracts are run through `text-embedding-ada-002` with [langchain](https://python.langchain.com/en/latest/ecosystem/openai.html) to generate a unique vector correpsonding to each paper. These are then compressed using [umap](https://umap-learn.readthedocs.io/en/latest/) and shown here, and can be used for similarity searches with methods like [faiss](https://github.com/facebookresearch/faiss). The scatterplot here can be paired with a heatmap for more targeted searches looking at a specific topic or area (see sidebar). Upgrade to chaotic neural suggested by Jo Ciucă, thank you! More to come (hopefully) with GPT-4 and its applications!")
-st.markdown("Interpreting the UMAP plot: the algorithm creates a 2d embedding from the high-dim vector space that tries to conserve as much similarity information as possible. Nearby points in UMAP space are similar, and grow dissimiliar as you move farther away. The axes do not have any physical meaning.")
 @st.cache_data
 def get_feeds_data(url):
     # data = cp.load(urlopen(url))

 from scipy import stats
 from urllib.request import urlopen
 @st.cache_data
 def get_feeds_data(url):
     # data = cp.load(urlopen(url))