DeepResearchEvaluator

Running on CPU Upgrade

App Files Files Community

DeepResearchEvaluator / README.md

awacke1

Update README.md

c565db3 verified 2 months ago

preview code

raw

history blame

5.18 kB

	---
	title: 🧠Deep🐍Research🌐Evaluator
	emoji: 🧠🐍🌐
	colorFrom: red
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.41.1
	app_file: app.py
	pinned: true
	license: mit
	short_description: Deep Research Evaluator for Long Horizon Learning Tasks
	---

	# 🎵', '🎶', '🎸', '🎹', '🎺', '🎷', '🥁', '🎻

	Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.


	# Project Architecture

	- 📂 Root Folder
	- app.py (🤖 Streamlit App)
	- Main entry point for your Streamlit application.
	- requirements.txt (📋 Dependencies)
	- Lists all the Python packages needed to run the app.
	- 📂 mycomponent (🔧 HTML Component)
	- A subdirectory containing your custom Streamlit component code.
	- \_\_init\_\_.py (🐍 Python Init)
	- Tells Python this folder is a module/package.
	- index.html (🌐 Custom HTML)
	- Front-end HTML/JS/CSS for the custom component.

	```mermaid
	flowchart TB
	A[📂 Root Folder] --> B[app.py 🤖<br>(Streamlit App)]
	A --> C[requirements.txt 📋<br>(Dependencies)]
	A --> D[📂 mycomponent 🔧<br>(HTML Component)]
	D --> E[__init__.py 🐍<br>(Python Init)]
	D --> F[index.html 🌐<br>(Custom HTML)]
	```

	---

	Usage Flow:

	1. You run `streamlit run app.py`.
	2. app.py imports mycomponent to load the HTML from index.html.
	3. requirements.txt ensures needed dependencies are installed.
	4. The \_\_init\_\_.py file ensures the custom component folder is recognized as a Python package.

	Notes:
	- app.py hosts your Streamlit logic and references the mycomponent.
	- index.html supplies the interface for any front-end custom elements.
	- requirements.txt keeps the environment consistent.




	Features
	🎯 Core Configuration & Setup
	Configures the Streamlit page with title “🚲TalkingAIResearcher🏆”, sets layout, sidebar states, and environment variables.

	🔑 API Setup & Clients
	Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.

	📝 Session State Management
	Manages conversation history, transcripts, file editing states, and model selections.

	🧠 get_high_info_terms()
	Extracts top words/bigrams from a text by counting frequency and filtering out stop words.

	🏷️ clean_text_for_filename()
	Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.

	📄 generate_filename()
	Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).

	💾 create_file()
	Saves prompt + response content to a file, using generate_filename().

	🔗 get_download_link()
	Generates base64-encoded download links for .md, audio, or zip files for inline downloading.

	🎤 clean_for_speech()
	Strips out line breaks, URLs, and symbols to create more readable text for TTS.

	🎙️ edge_tts_generate_audio()
	Asynchronously generates audio files (e.g., .mp3) using Edge TTS.

	🔊 speak_with_edge_tts()
	A wrapper function for the async TTS call, allowing direct usage in synchronous code.

	🎵 play_and_download_audio()
	Embeds an audio player in Streamlit and provides a download link for that audio file.

	💿 save_qa_with_audio()
	Stores Q&A content in a markdown file and generates TTS audio for the question + answer.

	📰 parse_arxiv_refs()
	Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.

	🔗 create_paper_links_md()
	Builds a minimal markdown page with numbered links to each paper’s ArXiv URL.

	📑 create_paper_audio_files()
	Processes each parsed paper, generating TTS audio and embedding base64 download links.

	📚 display_papers()
	Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.

	🗂 display_papers_in_sidebar()
	Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.

	📂 display_file_history_in_sidebar()
	Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.

	📦 create_zip_of_files()
	Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.

	🔍 perform_ai_lookup()
	The main function to:

	Query Anthropic (Claude)
	Call an ArXiv RAG pipeline
	Generate Q&A audio
	Parse and render the resulting papers
	🎧 process_voice_input()
	Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.

	🎬 main()
	Orchestrates the entire application flow:

	Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor
	Shows file history in the sidebar
	Manages marquee settings and final UI layout