Spaces:

kiyer
/

synthesist

Sleeping

App Files Files Community

synthesist / app.py

kiyer

basic files and codebase

6931cbb verified 9 months ago

raw

history blame

2.58 kB

	import streamlit as st
	from fns import *

	st.set_page_config(
	page_title="Synthesist",
	page_icon="👋",
	)

	# st.write("# Welcome to Pathfinder! 👋")
	st.image('local_files/synth_logo.png')

	st.sidebar.success("Select a function above.")
	st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.")


	st.markdown("")
	st.markdown(
	"""
	Synthesist (from Peter Watt's [Blindsight](https://scalar.usc.edu/works/network-ecologies/on-peter-watts-blindsight)) is a framework for searching and visualizing papers on the [arXiv](https://arxiv.org/) using the context
	sensitivity from modern large language models (LLMs) to better parse patterns in paper contexts.

	This tool was built during the [JSALT workshop](https://www.clsp.jhu.edu/2024-jelinek-summer-workshop-on-speech-and-language-technology/) to do awesome things.

	👈 Select a tool from the sidebar to see some examples
	of what this framework can do!

	### Tool summary:
	- Please wait while the initial data loads and compiles, this takes about a minute initially.
	- `Paper search` looks for relevant papers given an arxiv id or a question.

	This is not meant to be a replacement to existing tools like the
	[ADS](https://ui.adsabs.harvard.edu/),
	[arxivsorter](https://www.arxivsorter.org/), semantic search or google scholar, but rather a supplement to find papers
	that otherwise might be missed during a literature survey.
	It is trained on astro-ph (astrophysics of galaxies) papers up to last-year-ish mined from arxiv and supplemented with ADS metadata,
	if you are interested in extending it please reach out!


	Also add: more pages, actual generation, diff. toggles for retrieval/gen, feedback form, socials, literature, contact us, copyright, collaboration, etc.

	The image below shows a representation of all the astro-ph.GA papers that can be explored in more detail
	using the `Arxiv embedding` page. The papers tend to cluster together by similarity, and result in an
	atlas that shows well studied (forests) and currently uncharted areas (water).
	"""
	)


	s = time.time()
	st.markdown(f'Loading data for retrieval system, please wait before jumping to one of the pages....')
	st.session_state.retrieval_system = EmbeddingRetrievalSystem()
	st.session_state.dataset = load_dataset('arxiv_corpus/', split = "train")
	st.markdown(f'Loaded retrieval system, time taken: %.1f sec' %(time.time()-s))