Spaces:

awacke1
/

TorchTransformers-CV-SFT

Running

App Files Files Community

awacke1 commited on Mar 22

Commit

d7c78a2

verified ·

1 Parent(s): b659fbd

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -12,6 +12,36 @@ short_description: Torch Transformers Diffusion SFT for Computer Vision
 ---
 ## Abstract
 Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:

 ---
+Integration Details
+1. SFT Tiny Titans (First Listing):
+  - Features: Causal LM and Diffusion SFT, camera snap, RAG party.
+  - Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
+2. SFT Tiny Titans (Second Listing):
+  - Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
+  - Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
+3. AI Vision Titans (Current):
+  - Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
+  - Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
+4. Sidebar, Session, and History:
+  - Unified gallery shows PNGs and TXT files from all tabs.
+  - Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
+  - History log in sidebar records key actions (snapshots, SFT, tests).
+5. Workflow:
+  - Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
+7. Verification
+  - Run the App: streamlit run app.py
+8. Check:
+  - Camera Snap: Capture images, verify in gallery.
+  - Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
+  - Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
+  - Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
+  - Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
+  - Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
+9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
+10. Notes
+  - PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
+  - Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
+  - Session State: Persistent across tabs, ensuring workflow continuity.
 ## Abstract
 Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers: