Spaces:
Running
on
CPU Upgrade
title: ๐ง Deep๐Research๐Evaluator
emoji: ๐ง ๐๐
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: true
license: mit
short_description: Deep Research Evaluator for Long Horizon Learning Tasks
๐ต', '๐ถ', '๐ธ', '๐น', '๐บ', '๐ท', '๐ฅ', '๐ป
Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.
Project Architecture
- ๐ Root Folder
- app.py (๐ค Streamlit App)
- Main entry point for your Streamlit application.
- requirements.txt (๐ Dependencies)
- Lists all the Python packages needed to run the app.
- ๐ mycomponent (๐ง HTML Component)
- A subdirectory containing your custom Streamlit component code.
- __init__.py (๐ Python Init)
- Tells Python this folder is a module/package.
- index.html (๐ Custom HTML)
- Front-end HTML/JS/CSS for the custom component.
- app.py (๐ค Streamlit App)
flowchart TB
A[๐ Root Folder] --> B[app.py ๐ค<br>(Streamlit App)]
A --> C[requirements.txt ๐<br>(Dependencies)]
A --> D[๐ mycomponent ๐ง<br>(HTML Component)]
D --> E[__init__.py ๐<br>(Python Init)]
D --> F[index.html ๐<br>(Custom HTML)]
Usage Flow:
- You run
streamlit run app.py
. - app.py imports mycomponent to load the HTML from index.html.
- requirements.txt ensures needed dependencies are installed.
- The __init__.py file ensures the custom component folder is recognized as a Python package.
Notes:
- app.py hosts your Streamlit logic and references the mycomponent.
- index.html supplies the interface for any front-end custom elements.
- requirements.txt keeps the environment consistent.
Features ๐ฏ Core Configuration & Setup Configures the Streamlit page with title โ๐ฒTalkingAIResearcher๐โ, sets layout, sidebar states, and environment variables.
๐ API Setup & Clients Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.
๐ Session State Management Manages conversation history, transcripts, file editing states, and model selections.
๐ง get_high_info_terms() Extracts top words/bigrams from a text by counting frequency and filtering out stop words.
๐ท๏ธ clean_text_for_filename() Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.
๐ generate_filename() Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).
๐พ create_file() Saves prompt + response content to a file, using generate_filename().
๐ get_download_link() Generates base64-encoded download links for .md, audio, or zip files for inline downloading.
๐ค clean_for_speech() Strips out line breaks, URLs, and symbols to create more readable text for TTS.
๐๏ธ edge_tts_generate_audio() Asynchronously generates audio files (e.g., .mp3) using Edge TTS.
๐ speak_with_edge_tts() A wrapper function for the async TTS call, allowing direct usage in synchronous code.
๐ต play_and_download_audio() Embeds an audio player in Streamlit and provides a download link for that audio file.
๐ฟ save_qa_with_audio() Stores Q&A content in a markdown file and generates TTS audio for the question + answer.
๐ฐ parse_arxiv_refs() Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.
๐ create_paper_links_md() Builds a minimal markdown page with numbered links to each paperโs ArXiv URL.
๐ create_paper_audio_files() Processes each parsed paper, generating TTS audio and embedding base64 download links.
๐ display_papers() Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.
๐ display_papers_in_sidebar() Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.
๐ display_file_history_in_sidebar() Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.
๐ฆ create_zip_of_files() Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.
๐ perform_ai_lookup() The main function to:
Query Anthropic (Claude) Call an ArXiv RAG pipeline Generate Q&A audio Parse and render the resulting papers ๐ง process_voice_input() Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.
๐ฌ main() Orchestrates the entire application flow:
Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor Shows file history in the sidebar Manages marquee settings and final UI layout