title: TorchTransformers Diffusion CV SFT
emoji: โก
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
TorchTransformers Diffusion CV SFT Titans ๐
A Streamlit app blending torch
, transformers
, and diffusers
for vision and NLP fun! Snap PDFs ๐, turn them into double-page spreads ๐ผ๏ธ, extract text with GPT ๐ค, and craft emoji-packed Markdown outlines ๐โall with a witty UI and CPU-friendly SFT.
Integration Details
- SFT Tiny Titans (First Listing):
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved
ModelBuilder
andDiffusionBuilder
with SFT functionality.
- SFT Tiny Titans (Second Listing):
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent).
- AI Vision Titans (Current):
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction.
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates.
- Sidebar, Session, and History:
- Unified gallery shows PNGs, PDFs, and MD files from all tabs.
- Session state (
captured_files
,builder
,model_loaded
,processing
,history
) tracks all operations. - History log in sidebar records key actions (snapshots, SFT, tests).
- Workflow:
- Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโall saved in the gallery.
- Verification:
- Run:
streamlit run app.py
- Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery.
- Run:
- Notes:
- PDF URLs need direct links (e.g., arXivโs
/pdf/
path). - CPU defaults with CUDA fallback for broad compatibility.
- PDF URLs need direct links (e.g., arXivโs
Abstract
Fuse torch
, transformers
, and diffusers
with GPT vision for a wild AI ride! Dual st.camera_input
๐ท and PDF downloads ๐ feed a gallery, powering GOT-OCR2_0 ๐, Stable Diffusion ๐จ, and GPT text extraction ๐ค. Key papers:
- ๐ Streamlit Framework - Thiessen et al., 2023: UI magic.
- ๐ฅ PyTorch DL - Paszke et al., 2019: Torch core.
- ๐ง Attention is All You Need - Vaswani et al., 2017: NLP transformers.
- ๐จ Denoising Diffusion Probabilistic Models - Ho et al., 2020: Diffusion basics.
- ๐ GOT: General OCR Theory - Li et al., 2024: Advanced OCR.
- ๐จ Latent Diffusion Models - Rombach et al., 2022: Image generation.
- โ๏ธ LoRA: Low-Rank Adaptation - Hu et al., 2021: SFT efficiency.
- ๐ RAG: Retrieval-Augmented Generation - Lewis et al., 2020: RAG foundations.
- ๐๏ธ Vision Transformers - Dosovitskiy et al., 2020: Vision backbone.
- ๐ GPT-4 Technical Report - OpenAI, 2023: GPT power.
- ๐ผ๏ธ CLIP: Learning Transferable Visual Models - Radford et al., 2021: Vision-language bridge.
- โฐ Time Zone Handling in Python - Henshaw, 2023:
pytz
context.
Run: pip install -r requirements.txt
, streamlit run app.py
. Snap, process, summarize! โก
Usage ๐ฏ
- ๐ท Camera Snap: Capture pics with dual cams.
- ๐ฅ Download PDFs: Fetch papers (e.g., arXiv links below).
- ๐ PDF Process: Snapshot to double-page spreads, extract text with GPT.
- ๐ผ๏ธ Image Process: OCR images with GPT vision.
- ๐ MD Gallery: Summarize Markdown files into emoji outlines.
Tutorial: Single to Double Page Emoji Outlines
Single Page Outline: Key Functions in app.py
Function | Purpose ๐ฏ | How It Works ๐ ๏ธ | Emoji Insight ๐ |
---|---|---|---|
generate_filename |
Unique file names ๐ | Adds timestamp to sequence | ๐ฐ๏ธ Timeโs your file buddy! |
pdf_url_to_filename |
Safe PDF names ๐๏ธ | Cleans URLs to underscores | ๐ซ No URL mess! |
get_download_link |
Downloadable files โฌ๏ธ | Base64-encodes for HTML links | ๐ฆ Grab it, go! |
download_pdf |
Web PDF snatcher ๐ | Fetches PDFs with requests |
๐ PDF pirate ahoy! |
process_pdf_snapshot |
PDF to images ๐ผ๏ธ | Async snapshots (single/double/all) with fitz |
๐ธ Double-page dazzle! |
process_ocr |
Image text extractor ๐ | Async GOT-OCR2_0 with transformers |
๐ Text ninja strikes! |
process_image_gen |
Prompt to image ๐จ | Async Stable Diffusion with diffusers |
๐๏ธ Art from wordsโbam! |
process_image_with_prompt |
GPT image analysis ๐ค | Base64 to GPT vision | ๐ง GPT sees all! |
process_text_with_prompt |
GPT text summarizer โ๏ธ | Text to GPT for outlining | ๐ Summarize like a pro! |
update_gallery |
File showcase ๐ผ๏ธ๐ | Sidebar display with delete options | ๐ Your creations shine! |
Double Page Outline: Libraries in requirements.txt
Library | Single Page Purpose ๐ฏ | Double Page Usage ๐ ๏ธ | Emoji Insight ๐ |
---|---|---|---|
streamlit |
App UI ๐ | Tabs like โPDF Process ๐โ and โMD Gallery ๐โ | ๐ฌ App starโlights, action! |
pandas |
Data crunching ๐ | Ready for OCR/metadata tables | ๐ Table tamer awaits! |
torch |
ML engine ๐ฅ | Powers transformers and diffusers |
๐ฅ AIโs fiery heart! |
requests |
Web grabber ๐ | Downloads PDFs in download_pdf |
๐ Web loot collector! |
aiofiles |
Fast file ops โก | Async writes in process_ocr |
โ๏ธ File speed demon! |
pillow |
Image magic ๐๏ธ | PDF to image in process_pdf_snapshot |
๐ผ๏ธ Pixel Picasso! |
PyMuPDF |
PDF handler ๐ | Snapshots in process_pdf_snapshot |
๐ PDF scroll master! |
transformers |
AI models ๐ฃ๏ธ | GOT-OCR2_0 in process_ocr |
๐ค Brain in a box! |
diffusers |
Image gen ๐จ | Stable Diffusion in process_image_gen |
๐จ Art generator supreme! |
openai |
GPT vision/text ๐ค | Image/text processing in GPT functions | ๐ All-seeing AI oracle! |
glob2 |
File finder ๐ | Gallery files in update_gallery |
๐ต๏ธ File sleuth! |
pytz |
Time zones โฐ | Timestamps in generate_filename |
โณ Time wizard! |
Automation Instructions: Witty & Funny Steps ๐
Load PDFs ๐
- Drop URLs into โDownload PDFs ๐ฅโ or upload files.
- Emoji Tip: ๐ฆ Unleash the PDF beastโroar through arXiv!
Double-Page Snap ๐ธ
- Click โSnapshot Selected ๐ธโ with โTwo Pages (High-Res)โโlandscape glory!
- Witty Note: Two pages > one, because who reads half a comic? ๐ฆธ
GPT Vision Zap โก
- In โPDF Process ๐โ, pick a GPT model (e.g.,
gpt-4o-mini
) and zap text out. - Funny Bit: GPTโs like โI see text, mortals!โ ๐๏ธ
- In โPDF Process ๐โ, pick a GPT model (e.g.,
Markdown Mash ๐
- โMD Gallery ๐โ takes Markdown files, smashes them into a 12-point emoji outline.
- Sassy Tip: 12 pointsโbecause 11โs weak and 13โs overkill! ๐
Innovative Features ๐
- Double-Page Spreads: High-res, landscape images from PDFsโperfect for apps! ๐ฅ๏ธ
- GPT Model Picker: Swap
gpt-4o
forgpt-4o-mini
โspeed vs. smarts! โก๐ง - 12-Point Emoji Outline: Clusters facts into 12 witty sectionsโe.g., โ1. Heroes ๐ฆธโ, โ2. Tech ๐งโ. ๐
Mermaid Process Flow ๐งโโ๏ธ
graph TD
A[๐ PDFs] -->|๐ฅ Download| B[๐ PDF Process]
B -->|๐ธ Snapshot| C[๐ผ๏ธ Double-Page Images]
C -->|๐ค GPT Vision| D[๐ Markdown Files]
D -->|๐ MD Gallery| E[โ๏ธ 12-Point Emoji Outline]
A:::pdf
B:::process
C:::image
D:::markdown
E:::outline
classDef pdf fill:#f9f,stroke:#333,stroke-width:2px;
classDef process fill:#bbf,stroke:#333,stroke-width:2px;
classDef image fill:#bfb,stroke:#333,stroke-width:2px;
classDef markdown fill:#ffb,stroke:#333,stroke-width:2px;
classDef outline fill:#fbf,stroke:#333,stroke-width:2px;
Flow Explained:
- ๐ PDFs: Start with one or more PDFs on a topic.
- ๐ PDF Process: Download and snapshot into high-res double-page spreads.
- ๐ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT.
- ๐ Markdown Files: Text extracted per document, saved as Markdown.
- โ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ1. Context ๐โ, โ2. Methods ๐ฌโ, ..., โ12. Future ๐โ). Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโAI magic! โก
Key Updates
- Tutorial Section: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights.
- Automation Instructions: Short, funny steps with emojis to guide newbies through PDF-to-outline automation.
- Innovative Features: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features.
- Mermaid Diagram: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes.
- Updated arXiv Links: Refreshed to match current functionality (vision, OCR, GPT, diffusion):
- Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers.
- Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance.
How to Use
- Save this as
README.md
in your project folder. - View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered.
- Follow the automation steps to process PDFs and generate outlinesโperfect for learners exploring AI vision and text summarization!
This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐