awacke1's picture
Update README.md
d7c78a2 verified
|
raw
history blame
9.04 kB
metadata
title: TorchTransformers Diffusion CV SFT
emoji: 
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT for Computer Vision

Integration Details

  1. SFT Tiny Titans (First Listing):
  • Features: Causal LM and Diffusion SFT, camera snap, RAG party.
  • Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
  1. SFT Tiny Titans (Second Listing):
  • Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
  • Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
  1. AI Vision Titans (Current):
  • Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
  • Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
  1. Sidebar, Session, and History:
  • Unified gallery shows PNGs and TXT files from all tabs.
  • Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
  • History log in sidebar records key actions (snapshots, SFT, tests).
  1. Workflow:
  • Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
  1. Verification
  • Run the App: streamlit run app.py
  1. Check:
  • Camera Snap: Capture images, verify in gallery.
  • Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
  • Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
  • Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
  • Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
  • Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
  1. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
  2. Notes
  • PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
  • Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
  • Session State: Persistent across tabs, ensuring workflow continuity.

Abstract

Explore AI vision with torch, transformers, and diffusers! Dual st.camera_input 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:

  • 🌐 Streamlit - Thiessen et al., 2023: UI.
  • 🔥 PyTorch - Paszke et al., 2019: Core.
  • 🔍 Qwen2-VL - Li et al., 2024: Multimodal OCR.
  • 🔍 TrOCR - Li et al., 2021: Small OCR.
  • 🎨 LDM - Rombach et al., 2022: Image gen.
  • 👁️ OpenCV - Bradski, 2000: CV tools.

Run: pip install -r requirements.txt, streamlit run ${app_file}. Snap, test, innovate! ${emoji}

Usage 🎯

  • 📷 Camera Snap: Single or burst capture (auto 10 frames) with gallery.
  • 🔍 Test OCR: Qwen2-VL-OCR-2B or TrOCR-Small extracts text, saved async.
  • 🎨 Test Image Gen: OFA-Sys/small-stable-diffusion-v0 generates images, saved async.
  • ✏️ Test Line Drawings: OpenCV line art (Torch Space-inspired), saved async.

Abstract

Fuse torch, transformers, and diffusers for SFT-powered NLP and CV! Dual st.camera_input 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:

  • 🌐 Streamlit Framework - Thiessen et al., 2023: UI magic.
  • 🔥 PyTorch DL - Paszke et al., 2019: Torch core.
  • 🧠 Attention is All You Need - Vaswani et al., 2017: NLP transformers.
  • 🎨 DDPM - Ho et al., 2020: Denoising diffusion.
  • 📊 Pandas - McKinney, 2010: Data handling.
  • 🖼️ Pillow - Clark et al., 2023: Image processing.
  • pytz - Henshaw, 2023: Time zones.
  • 👁️ OpenCV - Bradski, 2000: CV tools.
  • 🎨 LDM - Rombach et al., 2022: Latent diffusion.
  • ⚙️ LoRA - Hu et al., 2021: SFT efficiency.
  • 🔍 RAG - Lewis et al., 2020: Retrieval-augmented generation.

Run: pip install -r requirements.txt, streamlit run ${app_file}. Build, snap, party! ${emoji}

Usage 🎯

  • 🌱📷 Build Titan & Camera Snap:
    • 🎨 Use Model: Run OFA-Sys/small-stable-diffusion-v0 (300 MB) or google/ddpm-ema-celebahq-256 (280 MB) online.
    • ⬇️ Download Model: Save <500 MB diffusion models locally.
    • 📷 Snap: Capture unique PNGs with dual cams.
  • 🔧 SFT: Tune Causal LM with CSV or Diffusion with image-text pairs.
  • 🧪 Test: Pair text with images, select pipeline, hit "Run Test 🚀".
  • 🌐 RAG Party: NLP plans or CV images for superhero bashes!

Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. pip install -r requirements.txt, streamlit run app.py. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed

SFT Tiny Titans 🚀 (Small Diffusion Delight!)

A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.

Features 🎉

  • Build Titan 🌱: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
  • Camera Snap 📷: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
  • Fine-Tune Titan (CV) 🔧: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
  • Test Titan (CV) 🧪: Generate images from prompts with your tuned diffusion titan.
  • Agentic RAG Party (CV) 🌐: Craft superhero party visuals from camera-inspired prompts.
  • Media Gallery 🎨: View, download, or zap captured images with flair.

Installation 🛠️

  1. Clone the repo:
    git clone <repository-url>
    cd sft-tiny-titans
    

Abstract

TorchTransformers Diffusion SFT Titans harnesses torch, transformers, and diffusers for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual st.camera_input captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with smolagents compatibility. Key papers illuminate the stack:

Run: pip install -r requirements.txt, streamlit run ${app_file}. Snap, tune, party! ${emoji}