Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
update readme bugfixes
Browse files- app.py +18 -2
- pages/2_arxiv_embedding.py +0 -5
app.py
CHANGED
@@ -8,19 +8,35 @@ st.set_page_config(
|
|
8 |
st.write("# Welcome to arXiv-GPT! π")
|
9 |
|
10 |
st.sidebar.success("Select a function above.")
|
11 |
-
st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding,
|
12 |
|
13 |
st.markdown(
|
14 |
"""
|
15 |
arXiv+GPT is a framework for searching and visualizing papers on
|
16 |
the [arXiv](https://arxiv.org/) using the context sensitivity from modern
|
17 |
-
large language models (LLMs)
|
18 |
|
19 |
**π Select a tool from the sidebar** to see some examples
|
20 |
of what this framework can do!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
### Want to learn more?
|
|
|
22 |
- Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
|
23 |
- Jump into our [documentation](https://docs.streamlit.io)
|
24 |
- Contribute!
|
|
|
|
|
|
|
25 |
"""
|
26 |
)
|
|
|
8 |
st.write("# Welcome to arXiv-GPT! π")
|
9 |
|
10 |
st.sidebar.success("Select a function above.")
|
11 |
+
st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.")
|
12 |
|
13 |
st.markdown(
|
14 |
"""
|
15 |
arXiv+GPT is a framework for searching and visualizing papers on
|
16 |
the [arXiv](https://arxiv.org/) using the context sensitivity from modern
|
17 |
+
large language models (LLMs) to better link paper contexts
|
18 |
|
19 |
**π Select a tool from the sidebar** to see some examples
|
20 |
of what this framework can do!
|
21 |
+
|
22 |
+
### Page summary:
|
23 |
+
- `Paper search` looks for relevant papers given an arxiv id or a question.
|
24 |
+
- `Arxiv embedding` shows the landscape of current galaxy evolution papers (astro-ph.GA)
|
25 |
+
- `QA sources` brings it all together to give concise answers to questions with primary sources and relevant papers.
|
26 |
+
|
27 |
+
### Coming soon:
|
28 |
+
- [AstroLLaMA](https://huggingface.co/spaces/universeTBD/astrollama) embeddings!
|
29 |
+
- export results
|
30 |
+
- daily updates to repo
|
31 |
+
- other fields apart from `astro-ph.GA`
|
32 |
+
|
33 |
### Want to learn more?
|
34 |
+
- Check out `AstroLLaMA` [paper](https://huggingface.co/papers/2309.06126)
|
35 |
- Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
|
36 |
- Jump into our [documentation](https://docs.streamlit.io)
|
37 |
- Contribute!
|
38 |
+
|
39 |
+
arXiv+GPT is developed and maintained by [UniverseTBD](https://universetbd.org/). Updates on [huggingface](https://huggingface.co/universeTBD) or [twitter](https://twitter.com/universe_tbd).
|
40 |
+
|
41 |
"""
|
42 |
)
|
pages/2_arxiv_embedding.py
CHANGED
@@ -11,11 +11,6 @@ import pickle
|
|
11 |
from scipy import stats
|
12 |
from urllib.request import urlopen
|
13 |
|
14 |
-
st.title("ArXiv+GPT3 embedding explorer")
|
15 |
-
st.markdown('[Includes papers up to: `'+dateval+'`]')
|
16 |
-
st.markdown("This is an explorer for astro-ph.GA papers on the arXiv (up to Apt 18th, 2023). The papers have been preprocessed with `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/) after which the collected abstracts are run through `text-embedding-ada-002` with [langchain](https://python.langchain.com/en/latest/ecosystem/openai.html) to generate a unique vector correpsonding to each paper. These are then compressed using [umap](https://umap-learn.readthedocs.io/en/latest/) and shown here, and can be used for similarity searches with methods like [faiss](https://github.com/facebookresearch/faiss). The scatterplot here can be paired with a heatmap for more targeted searches looking at a specific topic or area (see sidebar). Upgrade to chaotic neural suggested by Jo CiucΔ, thank you! More to come (hopefully) with GPT-4 and its applications!")
|
17 |
-
st.markdown("Interpreting the UMAP plot: the algorithm creates a 2d embedding from the high-dim vector space that tries to conserve as much similarity information as possible. Nearby points in UMAP space are similar, and grow dissimiliar as you move farther away. The axes do not have any physical meaning.")
|
18 |
-
|
19 |
@st.cache_data
|
20 |
def get_feeds_data(url):
|
21 |
# data = cp.load(urlopen(url))
|
|
|
11 |
from scipy import stats
|
12 |
from urllib.request import urlopen
|
13 |
|
|
|
|
|
|
|
|
|
|
|
14 |
@st.cache_data
|
15 |
def get_feeds_data(url):
|
16 |
# data = cp.load(urlopen(url))
|