awacke1's picture
Update README.md
c565db3 verified
|
raw
history blame
5.18 kB
---
title: 🧠Deep🐍Research🌐Evaluator
emoji: 🧠🐍🌐
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: true
license: mit
short_description: Deep Research Evaluator for Long Horizon Learning Tasks
---
# 🎡', '🎢', '🎸', '🎹', '🎺', '🎷', 'πŸ₯', '🎻
Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.
# Project Architecture
- πŸ“‚ **Root Folder**
- **app.py** (πŸ€– *Streamlit App*)
- Main entry point for your Streamlit application.
- **requirements.txt** (πŸ“‹ *Dependencies*)
- Lists all the Python packages needed to run the app.
- πŸ“‚ **mycomponent** (πŸ”§ *HTML Component*)
- A subdirectory containing your custom Streamlit component code.
- **\_\_init\_\_.py** (🐍 *Python Init*)
- Tells Python this folder is a module/package.
- **index.html** (🌐 *Custom HTML*)
- Front-end HTML/JS/CSS for the custom component.
```mermaid
flowchart TB
A[πŸ“‚ Root Folder] --> B[app.py πŸ€–<br>(Streamlit App)]
A --> C[requirements.txt πŸ“‹<br>(Dependencies)]
A --> D[πŸ“‚ mycomponent πŸ”§<br>(HTML Component)]
D --> E[__init__.py 🐍<br>(Python Init)]
D --> F[index.html 🌐<br>(Custom HTML)]
```
---
**Usage Flow**:
1. You run `streamlit run app.py`.
2. **app.py** imports **mycomponent** to load the HTML from **index.html**.
3. **requirements.txt** ensures needed dependencies are installed.
4. The **\_\_init\_\_.py** file ensures the custom component folder is recognized as a Python package.
**Notes**:
- **app.py** hosts your Streamlit logic and references the **mycomponent**.
- **index.html** supplies the interface for any front-end custom elements.
- **requirements.txt** keeps the environment consistent.
Features
🎯 Core Configuration & Setup
Configures the Streamlit page with title β€œπŸš²TalkingAIResearcherπŸ†β€, sets layout, sidebar states, and environment variables.
πŸ”‘ API Setup & Clients
Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.
πŸ“ Session State Management
Manages conversation history, transcripts, file editing states, and model selections.
🧠 get_high_info_terms()
Extracts top words/bigrams from a text by counting frequency and filtering out stop words.
🏷️ clean_text_for_filename()
Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.
πŸ“„ generate_filename()
Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).
πŸ’Ύ create_file()
Saves prompt + response content to a file, using generate_filename().
πŸ”— get_download_link()
Generates base64-encoded download links for .md, audio, or zip files for inline downloading.
🎀 clean_for_speech()
Strips out line breaks, URLs, and symbols to create more readable text for TTS.
πŸŽ™οΈ edge_tts_generate_audio()
Asynchronously generates audio files (e.g., .mp3) using Edge TTS.
πŸ”Š speak_with_edge_tts()
A wrapper function for the async TTS call, allowing direct usage in synchronous code.
🎡 play_and_download_audio()
Embeds an audio player in Streamlit and provides a download link for that audio file.
πŸ’Ώ save_qa_with_audio()
Stores Q&A content in a markdown file and generates TTS audio for the question + answer.
πŸ“° parse_arxiv_refs()
Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.
πŸ”— create_paper_links_md()
Builds a minimal markdown page with numbered links to each paper’s ArXiv URL.
πŸ“‘ create_paper_audio_files()
Processes each parsed paper, generating TTS audio and embedding base64 download links.
πŸ“š display_papers()
Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.
πŸ—‚ display_papers_in_sidebar()
Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.
πŸ“‚ display_file_history_in_sidebar()
Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.
πŸ“¦ create_zip_of_files()
Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.
πŸ” perform_ai_lookup()
The main function to:
Query Anthropic (Claude)
Call an ArXiv RAG pipeline
Generate Q&A audio
Parse and render the resulting papers
🎧 process_voice_input()
Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.
🎬 main()
Orchestrates the entire application flow:
Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor
Shows file history in the sidebar
Manages marquee settings and final UI layout