awacke1 commited on
Commit
d7c78a2
·
verified ·
1 Parent(s): b659fbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -12,6 +12,36 @@ short_description: Torch Transformers Diffusion SFT for Computer Vision
12
  ---
13
 
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Abstract
17
  Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
 
12
  ---
13
 
14
 
15
+ Integration Details
16
+ 1. SFT Tiny Titans (First Listing):
17
+ - Features: Causal LM and Diffusion SFT, camera snap, RAG party.
18
+ - Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
19
+ 2. SFT Tiny Titans (Second Listing):
20
+ - Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
21
+ - Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
22
+ 3. AI Vision Titans (Current):
23
+ - Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
24
+ - Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
25
+ 4. Sidebar, Session, and History:
26
+ - Unified gallery shows PNGs and TXT files from all tabs.
27
+ - Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
28
+ - History log in sidebar records key actions (snapshots, SFT, tests).
29
+ 5. Workflow:
30
+ - Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
31
+ 7. Verification
32
+ - Run the App: streamlit run app.py
33
+ 8. Check:
34
+ - Camera Snap: Capture images, verify in gallery.
35
+ - Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
36
+ - Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
37
+ - Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
38
+ - Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
39
+ - Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
40
+ 9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
41
+ 10. Notes
42
+ - PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
43
+ - Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
44
+ - Session State: Persistent across tabs, ensuring workflow continuity.
45
 
46
  ## Abstract
47
  Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers: