Spaces:
Running
on
CPU Upgrade
A newer version of the Streamlit SDK is available:
1.44.1
title: TorchTransformers Diffusion CV SFT
emoji: ⚡
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
TorchTransformers Diffusion CV SFT Titans 🚀
A Streamlit app blending torch
, transformers
, and diffusers
for vision and NLP fun! Snap PDFs 📄, turn them into double-page spreads 🖼️, extract text with GPT 🤖, and craft emoji-packed Markdown outlines 📝—all with a witty UI and CPU-friendly SFT.
Integration Details
- SFT Tiny Titans (First Listing):
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved
ModelBuilder
andDiffusionBuilder
with SFT functionality.
- SFT Tiny Titans (Second Listing):
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent).
- AI Vision Titans (Current):
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction.
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates.
- Sidebar, Session, and History:
- Unified gallery shows PNGs, PDFs, and MD files from all tabs.
- Session state (
captured_files
,builder
,model_loaded
,processing
,history
) tracks all operations. - History log in sidebar records key actions (snapshots, SFT, tests).
- Workflow:
- Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlines—all saved in the gallery.
- Verification:
- Run:
streamlit run app.py
- Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery.
- Run:
- Notes:
- PDF URLs need direct links (e.g., arXiv’s
/pdf/
path). - CPU defaults with CUDA fallback for broad compatibility.
- PDF URLs need direct links (e.g., arXiv’s
Abstract
Fuse torch
, transformers
, and diffusers
with GPT vision for a wild AI ride! Dual st.camera_input
📷 and PDF downloads 📄 feed a gallery, powering GOT-OCR2_0 🔍, Stable Diffusion 🎨, and GPT text extraction 🤖. Key papers:
- 🌐 Streamlit Framework - Thiessen et al., 2023: UI magic.
- 🔥 PyTorch DL - Paszke et al., 2019: Torch core.
- 🧠 Attention is All You Need - Vaswani et al., 2017: NLP transformers.
- 🎨 Denoising Diffusion Probabilistic Models - Ho et al., 2020: Diffusion basics.
- 🔍 GOT: General OCR Theory - Li et al., 2024: Advanced OCR.
- 🎨 Latent Diffusion Models - Rombach et al., 2022: Image generation.
- ⚙️ LoRA: Low-Rank Adaptation - Hu et al., 2021: SFT efficiency.
- 🔍 RAG: Retrieval-Augmented Generation - Lewis et al., 2020: RAG foundations.
- 👁️ Vision Transformers - Dosovitskiy et al., 2020: Vision backbone.
- 📝 GPT-4 Technical Report - OpenAI, 2023: GPT power.
- 🖼️ CLIP: Learning Transferable Visual Models - Radford et al., 2021: Vision-language bridge.
- ⏰ Time Zone Handling in Python - Henshaw, 2023:
pytz
context.
Run: pip install -r requirements.txt
, streamlit run app.py
. Snap, process, summarize! ⚡
Usage 🎯
- 📷 Camera Snap: Capture pics with dual cams.
- 📥 Download PDFs: Fetch papers (e.g., arXiv links below).
- 📄 PDF Process: Snapshot to double-page spreads, extract text with GPT.
- 🖼️ Image Process: OCR images with GPT vision.
- 📚 MD Gallery: Summarize Markdown files into emoji outlines.
Tutorial: Single to Double Page Emoji Outlines
Single Page Outline: Key Functions in app.py
Function | Purpose 🎯 | How It Works 🛠️ | Emoji Insight 😎 |
---|---|---|---|
generate_filename |
Unique file names 📅 | Adds timestamp to sequence | 🕰️ Time’s your file buddy! |
pdf_url_to_filename |
Safe PDF names 🖋️ | Cleans URLs to underscores | 🚫 No URL mess! |
get_download_link |
Downloadable files ⬇️ | Base64-encodes for HTML links | 📦 Grab it, go! |
download_pdf |
Web PDF snatcher 🌐 | Fetches PDFs with requests |
📚 PDF pirate ahoy! |
process_pdf_snapshot |
PDF to images 🖼️ | Async snapshots (single/double/all) with fitz |
📸 Double-page dazzle! |
process_ocr |
Image text extractor 🔍 | Async GOT-OCR2_0 with transformers |
👀 Text ninja strikes! |
process_image_gen |
Prompt to image 🎨 | Async Stable Diffusion with diffusers |
🖌️ Art from words—bam! |
process_image_with_prompt |
GPT image analysis 🤖 | Base64 to GPT vision | 🧠 GPT sees all! |
process_text_with_prompt |
GPT text summarizer ✍️ | Text to GPT for outlining | 📝 Summarize like a pro! |
update_gallery |
File showcase 🖼️📖 | Sidebar display with delete options | 🌟 Your creations shine! |
Double Page Outline: Libraries in requirements.txt
Library | Single Page Purpose 🎯 | Double Page Usage 🛠️ | Emoji Insight 😎 |
---|---|---|---|
streamlit |
App UI 🌐 | Tabs like “PDF Process 📄” and “MD Gallery 📚” | 🎬 App star—lights, action! |
pandas |
Data crunching 📈 | Ready for OCR/metadata tables | 📊 Table tamer awaits! |
torch |
ML engine 🔥 | Powers transformers and diffusers |
🔥 AI’s fiery heart! |
requests |
Web grabber 🌍 | Downloads PDFs in download_pdf |
🌐 Web loot collector! |
aiofiles |
Fast file ops ⚡ | Async writes in process_ocr |
✈️ File speed demon! |
pillow |
Image magic 🖌️ | PDF to image in process_pdf_snapshot |
🖼️ Pixel Picasso! |
PyMuPDF |
PDF handler 📜 | Snapshots in process_pdf_snapshot |
📜 PDF scroll master! |
transformers |
AI models 🗣️ | GOT-OCR2_0 in process_ocr |
🤖 Brain in a box! |
diffusers |
Image gen 🎨 | Stable Diffusion in process_image_gen |
🎨 Art generator supreme! |
openai |
GPT vision/text 🤖 | Image/text processing in GPT functions | 🌌 All-seeing AI oracle! |
glob2 |
File finder 🔍 | Gallery files in update_gallery |
🕵️ File sleuth! |
pytz |
Time zones ⏰ | Timestamps in generate_filename |
⏳ Time wizard! |
Automation Instructions: Witty & Funny Steps 😂
Load PDFs 📚
- Drop URLs into “Download PDFs 📥” or upload files.
- Emoji Tip: 🦁 Unleash the PDF beast—roar through arXiv!
Double-Page Snap 📸
- Click “Snapshot Selected 📸” with “Two Pages (High-Res)”—landscape glory!
- Witty Note: Two pages > one, because who reads half a comic? 🦸
GPT Vision Zap ⚡
- In “PDF Process 📄”, pick a GPT model (e.g.,
gpt-4o-mini
) and zap text out. - Funny Bit: GPT’s like “I see text, mortals!” 👁️
- In “PDF Process 📄”, pick a GPT model (e.g.,
Markdown Mash 📝
- “MD Gallery 📚” takes Markdown files, smashes them into a 12-point emoji outline.
- Sassy Tip: 12 points—because 11’s weak and 13’s overkill! 😜
Innovative Features 🌟
- Double-Page Spreads: High-res, landscape images from PDFs—perfect for apps! 🖥️
- GPT Model Picker: Swap
gpt-4o
forgpt-4o-mini
—speed vs. smarts! ⚡🧠 - 12-Point Emoji Outline: Clusters facts into 12 witty sections—e.g., “1. Heroes 🦸”, “2. Tech 🔧”. 🎉
Mermaid Process Flow 🧜♀️
graph TD
A[📚 PDFs] -->|📥 Download| B[📄 PDF Process]
B -->|📸 Snapshot| C[🖼️ Double-Page Images]
C -->|🤖 GPT Vision| D[📝 Markdown Files]
D -->|📚 MD Gallery| E[✍️ 12-Point Emoji Outline]
A:::pdf
B:::process
C:::image
D:::markdown
E:::outline
classDef pdf fill:#f9f,stroke:#333,stroke-width:2px;
classDef process fill:#bbf,stroke:#333,stroke-width:2px;
classDef image fill:#bfb,stroke:#333,stroke-width:2px;
classDef markdown fill:#ffb,stroke:#333,stroke-width:2px;
classDef outline fill:#fbf,stroke:#333,stroke-width:2px;
Flow Explained:
- 📚 PDFs: Start with one or more PDFs on a topic.
- 📄 PDF Process: Download and snapshot into high-res double-page spreads.
- 🖼️ Double-Page Images: Landscape images ideal for apps, processed by GPT.
- 📝 Markdown Files: Text extracted per document, saved as Markdown.
- ✍️ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., “1. Context 📜”, “2. Methods 🔬”, ..., “12. Future 🚀”). Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outline—AI magic! ⚡
Key Updates
- Tutorial Section: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights.
- Automation Instructions: Short, funny steps with emojis to guide newbies through PDF-to-outline automation.
- Innovative Features: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features.
- Mermaid Diagram: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes.
- Updated arXiv Links: Refreshed to match current functionality (vision, OCR, GPT, diffusion):
- Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers.
- Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance.
How to Use
- Save this as
README.md
in your project folder. - View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered.
- Follow the automation steps to process PDFs and generate outlines—perfect for learners exploring AI vision and text summarization!
This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! 🚀