DeepResearchEvaluator

Running on CPU Upgrade

File size: 5,535 Bytes

ea6602a
5d468ce
c565db3
ea6602a
 
 
8e3f21c
ea6602a
03becb4
ea6602a
3be3785
ea6602a
 
1a0dd7a
2e601c1
128aefc
 
b6fb714
 
 
 
 
 
 
128aefc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e3f21c
5c1c468
2e601c1
d8d8ca1
03becb4
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
 
d8d8ca1
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
d8d8ca1
 
03becb4
 
d8d8ca1
03becb4
d8d8ca1
 
 
 
 
 
03becb4
 
d8d8ca1
03becb4
d8d8ca1
 
8e3f21c

---
title: 🧠Deep🐍Research🌐Evaluator
emoji: 🧠🐍🌐
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: true
license: mit
short_description: 🧠Deep🐍Research🌐Evaluator for Long Horizon Learning Tasks
---

# 🎵', '🎶', '🎸', '🎹', '🎺', '🎷', '🥁', '🎻

Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.

Claude AI is built in.

Mixtral MOE 8B is the central model used alongside the arxiv embeddings RAG search:  

![image/png](https://cdn-uploads.huggingface.co/production/uploads/620630b603825909dcbeba35/742HW6RWYk35BK2g5Eq-T.png)



# Project Architecture

- 📂 **Root Folder**  
  - **app.py** (🤖 *Streamlit App*)  
    - Main entry point for your Streamlit application.  
  - **requirements.txt** (📋 *Dependencies*)  
    - Lists all the Python packages needed to run the app.  
  - 📂 **mycomponent** (🔧 *HTML Component*)  
    - A subdirectory containing your custom Streamlit component code.
    - **\_\_init\_\_.py** (🐍 *Python Init*)  
      - Tells Python this folder is a module/package.
    - **index.html** (🌐 *Custom HTML*)  
      - Front-end HTML/JS/CSS for the custom component.

```mermaid
flowchart TB
    A[📂 Root Folder] --> B[app.py 🤖<br>(Streamlit App)]
    A --> C[requirements.txt 📋<br>(Dependencies)]
    A --> D[📂 mycomponent 🔧<br>(HTML Component)]
    D --> E[__init__.py 🐍<br>(Python Init)]
    D --> F[index.html 🌐<br>(Custom HTML)]
```

---

**Usage Flow**:

1. You run `streamlit run app.py`.  
2. **app.py** imports **mycomponent** to load the HTML from **index.html**.  
3. **requirements.txt** ensures needed dependencies are installed.  
4. The **\_\_init\_\_.py** file ensures the custom component folder is recognized as a Python package.  

**Notes**:  
- **app.py** hosts your Streamlit logic and references the **mycomponent**.  
- **index.html** supplies the interface for any front-end custom elements.  
- **requirements.txt** keeps the environment consistent.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/620630b603825909dcbeba35/8MDX2gF29upLhxvLoc8yJ.png)


Features
🎯 Core Configuration & Setup
Configures the Streamlit page with title “🚲TalkingAIResearcher🏆”, sets layout, sidebar states, and environment variables.

🔑 API Setup & Clients
Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.

📝 Session State Management
Manages conversation history, transcripts, file editing states, and model selections.

🧠 get_high_info_terms()
Extracts top words/bigrams from a text by counting frequency and filtering out stop words.

🏷️ clean_text_for_filename()
Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.

📄 generate_filename()
Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).

💾 create_file()
Saves prompt + response content to a file, using generate_filename().

🔗 get_download_link()
Generates base64-encoded download links for .md, audio, or zip files for inline downloading.

🎤 clean_for_speech()
Strips out line breaks, URLs, and symbols to create more readable text for TTS.

🎙️ edge_tts_generate_audio()
Asynchronously generates audio files (e.g., .mp3) using Edge TTS.

🔊 speak_with_edge_tts()
A wrapper function for the async TTS call, allowing direct usage in synchronous code.

🎵 play_and_download_audio()
Embeds an audio player in Streamlit and provides a download link for that audio file.

💿 save_qa_with_audio()
Stores Q&A content in a markdown file and generates TTS audio for the question + answer.

📰 parse_arxiv_refs()
Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.

🔗 create_paper_links_md()
Builds a minimal markdown page with numbered links to each paper’s ArXiv URL.

📑 create_paper_audio_files()
Processes each parsed paper, generating TTS audio and embedding base64 download links.

📚 display_papers()
Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.

🗂 display_papers_in_sidebar()
Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.

📂 display_file_history_in_sidebar()
Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.

📦 create_zip_of_files()
Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.

🔍 perform_ai_lookup()
The main function to:

Query Anthropic (Claude)
Call an ArXiv RAG pipeline
Generate Q&A audio
Parse and render the resulting papers
🎧 process_voice_input()
Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.

🎬 main()
Orchestrates the entire application flow:

Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor
Shows file history in the sidebar
Manages marquee settings and final UI layout