awacke1's picture
Update README.md
8bd86ec verified
|
raw
history blame
12.1 kB
metadata
title: TorchTransformers Diffusion CV SFT
emoji: โšก
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision

TorchTransformers Diffusion CV SFT Titans ๐Ÿš€

A Streamlit app blending torch, transformers, and diffusers for vision and NLP fun! Snap PDFs ๐Ÿ“„, turn them into double-page spreads ๐Ÿ–ผ๏ธ, extract text with GPT ๐Ÿค–, and craft emoji-packed Markdown outlines ๐Ÿ“โ€”all with a witty UI and CPU-friendly SFT.

Integration Details

  1. SFT Tiny Titans (First Listing):
    • Features: Causal LM and Diffusion SFT, camera snap, RAG party.
    • Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
  2. SFT Tiny Titans (Second Listing):
    • Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
    • Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent).
  3. AI Vision Titans (Current):
    • Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction.
    • Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates.
  4. Sidebar, Session, and History:
    • Unified gallery shows PNGs, PDFs, and MD files from all tabs.
    • Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
    • History log in sidebar records key actions (snapshots, SFT, tests).
  5. Workflow:
    • Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโ€”all saved in the gallery.
  6. Verification:
    • Run: streamlit run app.py
    • Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery.
  7. Notes:
    • PDF URLs need direct links (e.g., arXivโ€™s /pdf/ path).
    • CPU defaults with CUDA fallback for broad compatibility.

Abstract

Fuse torch, transformers, and diffusers with GPT vision for a wild AI ride! Dual st.camera_input ๐Ÿ“ท and PDF downloads ๐Ÿ“„ feed a gallery, powering GOT-OCR2_0 ๐Ÿ”, Stable Diffusion ๐ŸŽจ, and GPT text extraction ๐Ÿค–. Key papers:

Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, summarize! โšก

Usage ๐ŸŽฏ

  • ๐Ÿ“ท Camera Snap: Capture pics with dual cams.
  • ๐Ÿ“ฅ Download PDFs: Fetch papers (e.g., arXiv links below).
  • ๐Ÿ“„ PDF Process: Snapshot to double-page spreads, extract text with GPT.
  • ๐Ÿ–ผ๏ธ Image Process: OCR images with GPT vision.
  • ๐Ÿ“š MD Gallery: Summarize Markdown files into emoji outlines.

Tutorial: Single to Double Page Emoji Outlines

Single Page Outline: Key Functions in app.py

Function Purpose ๐ŸŽฏ How It Works ๐Ÿ› ๏ธ Emoji Insight ๐Ÿ˜Ž
generate_filename Unique file names ๐Ÿ“… Adds timestamp to sequence ๐Ÿ•ฐ๏ธ Timeโ€™s your file buddy!
pdf_url_to_filename Safe PDF names ๐Ÿ–‹๏ธ Cleans URLs to underscores ๐Ÿšซ No URL mess!
get_download_link Downloadable files โฌ‡๏ธ Base64-encodes for HTML links ๐Ÿ“ฆ Grab it, go!
download_pdf Web PDF snatcher ๐ŸŒ Fetches PDFs with requests ๐Ÿ“š PDF pirate ahoy!
process_pdf_snapshot PDF to images ๐Ÿ–ผ๏ธ Async snapshots (single/double/all) with fitz ๐Ÿ“ธ Double-page dazzle!
process_ocr Image text extractor ๐Ÿ” Async GOT-OCR2_0 with transformers ๐Ÿ‘€ Text ninja strikes!
process_image_gen Prompt to image ๐ŸŽจ Async Stable Diffusion with diffusers ๐Ÿ–Œ๏ธ Art from wordsโ€”bam!
process_image_with_prompt GPT image analysis ๐Ÿค– Base64 to GPT vision ๐Ÿง  GPT sees all!
process_text_with_prompt GPT text summarizer โœ๏ธ Text to GPT for outlining ๐Ÿ“ Summarize like a pro!
update_gallery File showcase ๐Ÿ–ผ๏ธ๐Ÿ“– Sidebar display with delete options ๐ŸŒŸ Your creations shine!

Double Page Outline: Libraries in requirements.txt

Library Single Page Purpose ๐ŸŽฏ Double Page Usage ๐Ÿ› ๏ธ Emoji Insight ๐Ÿ˜Ž
streamlit App UI ๐ŸŒ Tabs like โ€œPDF Process ๐Ÿ“„โ€ and โ€œMD Gallery ๐Ÿ“šโ€ ๐ŸŽฌ App starโ€”lights, action!
pandas Data crunching ๐Ÿ“ˆ Ready for OCR/metadata tables ๐Ÿ“Š Table tamer awaits!
torch ML engine ๐Ÿ”ฅ Powers transformers and diffusers ๐Ÿ”ฅ AIโ€™s fiery heart!
requests Web grabber ๐ŸŒ Downloads PDFs in download_pdf ๐ŸŒ Web loot collector!
aiofiles Fast file ops โšก Async writes in process_ocr โœˆ๏ธ File speed demon!
pillow Image magic ๐Ÿ–Œ๏ธ PDF to image in process_pdf_snapshot ๐Ÿ–ผ๏ธ Pixel Picasso!
PyMuPDF PDF handler ๐Ÿ“œ Snapshots in process_pdf_snapshot ๐Ÿ“œ PDF scroll master!
transformers AI models ๐Ÿ—ฃ๏ธ GOT-OCR2_0 in process_ocr ๐Ÿค– Brain in a box!
diffusers Image gen ๐ŸŽจ Stable Diffusion in process_image_gen ๐ŸŽจ Art generator supreme!
openai GPT vision/text ๐Ÿค– Image/text processing in GPT functions ๐ŸŒŒ All-seeing AI oracle!
glob2 File finder ๐Ÿ” Gallery files in update_gallery ๐Ÿ•ต๏ธ File sleuth!
pytz Time zones โฐ Timestamps in generate_filename โณ Time wizard!

Automation Instructions: Witty & Funny Steps ๐Ÿ˜‚

  1. Load PDFs ๐Ÿ“š

    • Drop URLs into โ€œDownload PDFs ๐Ÿ“ฅโ€ or upload files.
    • Emoji Tip: ๐Ÿฆ Unleash the PDF beastโ€”roar through arXiv!
  2. Double-Page Snap ๐Ÿ“ธ

    • Click โ€œSnapshot Selected ๐Ÿ“ธโ€ with โ€œTwo Pages (High-Res)โ€โ€”landscape glory!
    • Witty Note: Two pages > one, because who reads half a comic? ๐Ÿฆธ
  3. GPT Vision Zap โšก

    • In โ€œPDF Process ๐Ÿ“„โ€, pick a GPT model (e.g., gpt-4o-mini) and zap text out.
    • Funny Bit: GPTโ€™s like โ€œI see text, mortals!โ€ ๐Ÿ‘๏ธ
  4. Markdown Mash ๐Ÿ“

    • โ€œMD Gallery ๐Ÿ“šโ€ takes Markdown files, smashes them into a 12-point emoji outline.
    • Sassy Tip: 12 pointsโ€”because 11โ€™s weak and 13โ€™s overkill! ๐Ÿ˜œ

Innovative Features ๐ŸŒŸ

  • Double-Page Spreads: High-res, landscape images from PDFsโ€”perfect for apps! ๐Ÿ–ฅ๏ธ
  • GPT Model Picker: Swap gpt-4o for gpt-4o-miniโ€”speed vs. smarts! โšก๐Ÿง 
  • 12-Point Emoji Outline: Clusters facts into 12 witty sectionsโ€”e.g., โ€œ1. Heroes ๐Ÿฆธโ€, โ€œ2. Tech ๐Ÿ”งโ€. ๐ŸŽ‰

Mermaid Process Flow ๐Ÿงœโ€โ™€๏ธ

graph TD
    A[๐Ÿ“š PDFs] -->|๐Ÿ“ฅ Download| B[๐Ÿ“„ PDF Process]
    B -->|๐Ÿ“ธ Snapshot| C[๐Ÿ–ผ๏ธ Double-Page Images]
    C -->|๐Ÿค– GPT Vision| D[๐Ÿ“ Markdown Files]
    D -->|๐Ÿ“š MD Gallery| E[โœ๏ธ 12-Point Emoji Outline]

    A:::pdf
    B:::process
    C:::image
    D:::markdown
    E:::outline

    classDef pdf fill:#f9f,stroke:#333,stroke-width:2px;
    classDef process fill:#bbf,stroke:#333,stroke-width:2px;
    classDef image fill:#bfb,stroke:#333,stroke-width:2px;
    classDef markdown fill:#ffb,stroke:#333,stroke-width:2px;
    classDef outline fill:#fbf,stroke:#333,stroke-width:2px;

Flow Explained:

  1. ๐Ÿ“š PDFs: Start with one or more PDFs on a topic.
  2. ๐Ÿ“„ PDF Process: Download and snapshot into high-res double-page spreads.
  3. ๐Ÿ–ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT.
  4. ๐Ÿ“ Markdown Files: Text extracted per document, saved as Markdown.
  5. โœ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ€œ1. Context ๐Ÿ“œโ€, โ€œ2. Methods ๐Ÿ”ฌโ€, ..., โ€œ12. Future ๐Ÿš€โ€). Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโ€”AI magic! โšก

Key Updates

  1. Tutorial Section: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights.
  2. Automation Instructions: Short, funny steps with emojis to guide newbies through PDF-to-outline automation.
  3. Innovative Features: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features.
  4. Mermaid Diagram: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes.
  5. Updated arXiv Links: Refreshed to match current functionality (vision, OCR, GPT, diffusion):
    • Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers.
    • Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance.

How to Use

  • Save this as README.md in your project folder.
  • View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered.
  • Follow the automation steps to process PDFs and generate outlinesโ€”perfect for learners exploring AI vision and text summarization!

This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐Ÿš€