Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,36 @@ short_description: Torch Transformers Diffusion SFT for Computer Vision
|
|
12 |
---
|
13 |
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
## Abstract
|
17 |
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
|
|
|
12 |
---
|
13 |
|
14 |
|
15 |
+
Integration Details
|
16 |
+
1. SFT Tiny Titans (First Listing):
|
17 |
+
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
|
18 |
+
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
|
19 |
+
2. SFT Tiny Titans (Second Listing):
|
20 |
+
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
|
21 |
+
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
|
22 |
+
3. AI Vision Titans (Current):
|
23 |
+
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
|
24 |
+
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
|
25 |
+
4. Sidebar, Session, and History:
|
26 |
+
- Unified gallery shows PNGs and TXT files from all tabs.
|
27 |
+
- Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
|
28 |
+
- History log in sidebar records key actions (snapshots, SFT, tests).
|
29 |
+
5. Workflow:
|
30 |
+
- Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
|
31 |
+
7. Verification
|
32 |
+
- Run the App: streamlit run app.py
|
33 |
+
8. Check:
|
34 |
+
- Camera Snap: Capture images, verify in gallery.
|
35 |
+
- Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
|
36 |
+
- Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
|
37 |
+
- Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
|
38 |
+
- Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
|
39 |
+
- Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
|
40 |
+
9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
|
41 |
+
10. Notes
|
42 |
+
- PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
|
43 |
+
- Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
|
44 |
+
- Session State: Persistent across tabs, ensuring workflow continuity.
|
45 |
|
46 |
## Abstract
|
47 |
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
|