Spaces:
Sleeping
Sleeping
import streamlit as st | |
from fns import * | |
st.set_page_config( | |
page_title="Synthesist", | |
page_icon="π", | |
) | |
# st.write("# Welcome to Pathfinder! π") | |
st.image('local_files/synth_logo.png') | |
st.sidebar.success("Select a function above.") | |
st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.") | |
st.markdown("") | |
st.markdown( | |
""" | |
**Synthesist** (from Peter Watt's [Blindsight](https://scalar.usc.edu/works/network-ecologies/on-peter-watts-blindsight)) is a framework for searching and visualizing papers on the [arXiv](https://arxiv.org/) using the context | |
sensitivity from modern large language models (LLMs) to better parse patterns in paper contexts. | |
This tool was built during the [JSALT workshop](https://www.clsp.jhu.edu/2024-jelinek-summer-workshop-on-speech-and-language-technology/) to do awesome things. | |
**π Select a tool from the sidebar** to see some examples | |
of what this framework can do! | |
### Tool summary: | |
- Please wait while the initial data loads and compiles, this takes about a minute initially. | |
- `Paper search` looks for relevant papers given an arxiv id or a question. | |
This is not meant to be a replacement to existing tools like the | |
[ADS](https://ui.adsabs.harvard.edu/), | |
[arxivsorter](https://www.arxivsorter.org/), semantic search or google scholar, but rather a supplement to find papers | |
that otherwise might be missed during a literature survey. | |
It is trained on astro-ph (astrophysics of galaxies) papers up to last-year-ish mined from arxiv and supplemented with ADS metadata, | |
if you are interested in extending it please reach out! | |
Also add: more pages, actual generation, diff. toggles for retrieval/gen, feedback form, socials, literature, contact us, copyright, collaboration, etc. | |
The image below shows a representation of all the astro-ph.GA papers that can be explored in more detail | |
using the `Arxiv embedding` page. The papers tend to cluster together by similarity, and result in an | |
atlas that shows well studied (forests) and currently uncharted areas (water). | |
""" | |
) | |
s = time.time() | |
st.markdown(f'Loading data for retrieval system, please wait before jumping to one of the pages....') | |
st.session_state.retrieval_system = EmbeddingRetrievalSystem() | |
st.session_state.dataset = load_dataset('arxiv_corpus/', split = "train") | |
st.markdown(f'Loaded retrieval system, time taken: %.1f sec' %(time.time()-s)) | |