Update README.md
Browse files
README.md
CHANGED
@@ -11,119 +11,162 @@ license: mit
|
|
11 |
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
|
12 |
---
|
13 |
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
|
42 |
-
10. Notes
|
43 |
-
- PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.orgโs /download/ path). Adjust as needed.
|
44 |
-
- Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
|
45 |
-
- Session State: Persistent across tabs, ensuring workflow continuity.
|
46 |
-
|
47 |
-
## Abstract
|
48 |
-
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` ๐ท captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
|
49 |
-
|
50 |
-
- ๐ **[Streamlit](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI.
|
51 |
-
- ๐ฅ **[PyTorch](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Core.
|
52 |
-
- ๐ **[Qwen2-VL](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Multimodal OCR.
|
53 |
-
- ๐ **[TrOCR](https://arxiv.org/abs/2109.10282)** - Li et al., 2021: Small OCR.
|
54 |
-
- ๐จ **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image gen.
|
55 |
-
- ๐๏ธ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
|
56 |
-
|
57 |
-
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, test, innovate! ${emoji}
|
58 |
-
|
59 |
-
## Usage ๐ฏ
|
60 |
-
- ๐ท **Camera Snap**: Single or burst capture (auto 10 frames) with gallery.
|
61 |
-
- ๐ **Test OCR**: `Qwen2-VL-OCR-2B` or `TrOCR-Small` extracts text, saved async.
|
62 |
-
- ๐จ **Test Image Gen**: `OFA-Sys/small-stable-diffusion-v0` generates images, saved async.
|
63 |
-
- โ๏ธ **Test Line Drawings**: OpenCV line art (Torch Space-inspired), saved async.
|
64 |
|
65 |
## Abstract
|
66 |
-
Fuse `torch`, `transformers`, and `diffusers` for
|
67 |
|
68 |
- ๐ **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic.
|
69 |
- ๐ฅ **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core.
|
70 |
- ๐ง **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers.
|
71 |
-
- ๐จ **[
|
72 |
-
-
|
73 |
-
-
|
74 |
-
-
|
75 |
-
-
|
76 |
-
-
|
77 |
-
-
|
78 |
-
-
|
79 |
-
|
80 |
-
|
|
|
81 |
|
82 |
## Usage ๐ฏ
|
83 |
-
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
-
|
88 |
-
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
|
|
91 |
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
-
|
100 |
-
|
101 |
-
|
102 |
-
-
|
103 |
-
-
|
104 |
-
-
|
105 |
-
|
106 |
-
|
107 |
-
1. Clone the repo:
|
108 |
-
```bash
|
109 |
-
git clone <repository-url>
|
110 |
-
cd sft-tiny-titans
|
111 |
-
|
112 |
-
## Abstract
|
113 |
-
TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack:
|
114 |
-
|
115 |
-
- **[Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: Streamlitโs UI framework.
|
116 |
-
- **[PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch foundation.
|
117 |
-
- **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: Transformers for NLP.
|
118 |
-
- **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion models in CV.
|
119 |
-
- **[Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling with Pandas.
|
120 |
-
- **[Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
|
121 |
-
- **[pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
|
122 |
-
- **[OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV processing (no direct arXiv, but seminal).
|
123 |
-
- **[Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504)** - Dosovitskiy et al., 2021: SFT for CV.
|
124 |
-
- **[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: Efficient SFT techniques.
|
125 |
-
- **[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations.
|
126 |
-
- **[Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Combined NLP/CV SFT.
|
127 |
-
|
128 |
-
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji}
|
129 |
-
|
|
|
11 |
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
|
12 |
---
|
13 |
|
14 |
+
# TorchTransformers Diffusion CV SFT Titans ๐
|
15 |
+
|
16 |
+
A Streamlit app blending `torch`, `transformers`, and `diffusers` for vision and NLP fun! Snap PDFs ๐, turn them into double-page spreads ๐ผ๏ธ, extract text with GPT ๐ค, and craft emoji-packed Markdown outlines ๐โall with a witty UI and CPU-friendly SFT.
|
17 |
+
|
18 |
+
## Integration Details
|
19 |
+
|
20 |
+
1. **SFT Tiny Titans (First Listing)**:
|
21 |
+
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
|
22 |
+
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved `ModelBuilder` and `DiffusionBuilder` with SFT functionality.
|
23 |
+
2. **SFT Tiny Titans (Second Listing)**:
|
24 |
+
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
|
25 |
+
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent).
|
26 |
+
3. **AI Vision Titans (Current)**:
|
27 |
+
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction.
|
28 |
+
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates.
|
29 |
+
4. **Sidebar, Session, and History**:
|
30 |
+
- Unified gallery shows PNGs, PDFs, and MD files from all tabs.
|
31 |
+
- Session state (`captured_files`, `builder`, `model_loaded`, `processing`, `history`) tracks all operations.
|
32 |
+
- History log in sidebar records key actions (snapshots, SFT, tests).
|
33 |
+
5. **Workflow**:
|
34 |
+
- Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโall saved in the gallery.
|
35 |
+
6. **Verification**:
|
36 |
+
- Run: `streamlit run app.py`
|
37 |
+
- Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery.
|
38 |
+
7. **Notes**:
|
39 |
+
- PDF URLs need direct links (e.g., arXivโs `/pdf/` path).
|
40 |
+
- CPU defaults with CUDA fallback for broad compatibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
## Abstract
|
43 |
+
Fuse `torch`, `transformers`, and `diffusers` with GPT vision for a wild AI ride! Dual `st.camera_input` ๐ท and PDF downloads ๐ feed a gallery, powering GOT-OCR2_0 ๐, Stable Diffusion ๐จ, and GPT text extraction ๐ค. Key papers:
|
44 |
|
45 |
- ๐ **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic.
|
46 |
- ๐ฅ **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core.
|
47 |
- ๐ง **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers.
|
48 |
+
- ๐จ **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion basics.
|
49 |
+
- ๐ **[GOT: General OCR Theory](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Advanced OCR.
|
50 |
+
- ๐จ **[Latent Diffusion Models](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image generation.
|
51 |
+
- โ๏ธ **[LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency.
|
52 |
+
- ๐ **[RAG: Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations.
|
53 |
+
- ๐๏ธ **[Vision Transformers](https://arxiv.org/abs/2010.11929)** - Dosovitskiy et al., 2020: Vision backbone.
|
54 |
+
- ๐ **[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)** - OpenAI, 2023: GPT power.
|
55 |
+
- ๐ผ๏ธ **[CLIP: Learning Transferable Visual Models](https://arxiv.org/abs/2103.00020)** - Radford et al., 2021: Vision-language bridge.
|
56 |
+
- โฐ **[Time Zone Handling in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: `pytz` context.
|
57 |
+
|
58 |
+
Run: `pip install -r requirements.txt`, `streamlit run app.py`. Snap, process, summarize! โก
|
59 |
|
60 |
## Usage ๐ฏ
|
61 |
+
- ๐ท **Camera Snap**: Capture pics with dual cams.
|
62 |
+
- ๐ฅ **Download PDFs**: Fetch papers (e.g., arXiv links below).
|
63 |
+
- ๐ **PDF Process**: Snapshot to double-page spreads, extract text with GPT.
|
64 |
+
- ๐ผ๏ธ **Image Process**: OCR images with GPT vision.
|
65 |
+
- ๐ **MD Gallery**: Summarize Markdown files into emoji outlines.
|
66 |
+
|
67 |
+
## Tutorial: Single to Double Page Emoji Outlines
|
68 |
+
|
69 |
+
### Single Page Outline: Key Functions in `app.py`
|
70 |
+
|
71 |
+
| **Function** | **Purpose** ๐ฏ | **How It Works** ๐ ๏ธ | **Emoji Insight** ๐ |
|
72 |
+
|----------------------------|---------------------------------------------|--------------------------------------------------|-------------------------------|
|
73 |
+
| `generate_filename` | Unique file names ๐
| Adds timestamp to sequence | ๐ฐ๏ธ Timeโs your file buddy! |
|
74 |
+
| `pdf_url_to_filename` | Safe PDF names ๐๏ธ | Cleans URLs to underscores | ๐ซ No URL mess! |
|
75 |
+
| `get_download_link` | Downloadable files โฌ๏ธ | Base64-encodes for HTML links | ๐ฆ Grab it, go! |
|
76 |
+
| `download_pdf` | Web PDF snatcher ๐ | Fetches PDFs with `requests` | ๐ PDF pirate ahoy! |
|
77 |
+
| `process_pdf_snapshot` | PDF to images ๐ผ๏ธ | Async snapshots (single/double/all) with `fitz` | ๐ธ Double-page dazzle! |
|
78 |
+
| `process_ocr` | Image text extractor ๐ | Async GOT-OCR2_0 with `transformers` | ๐ Text ninja strikes! |
|
79 |
+
| `process_image_gen` | Prompt to image ๐จ | Async Stable Diffusion with `diffusers` | ๐๏ธ Art from wordsโbam! |
|
80 |
+
| `process_image_with_prompt`| GPT image analysis ๐ค | Base64 to GPT vision | ๐ง GPT sees all! |
|
81 |
+
| `process_text_with_prompt` | GPT text summarizer โ๏ธ | Text to GPT for outlining | ๐ Summarize like a pro! |
|
82 |
+
| `update_gallery` | File showcase ๐ผ๏ธ๐ | Sidebar display with delete options | ๐ Your creations shine! |
|
83 |
+
|
84 |
+
### Double Page Outline: Libraries in `requirements.txt`
|
85 |
+
|
86 |
+
| **Library** | **Single Page Purpose** ๐ฏ | **Double Page Usage** ๐ ๏ธ | **Emoji Insight** ๐ |
|
87 |
+
|---------------|-------------------------------------------|----------------------------------------------------|-------------------------------|
|
88 |
+
| `streamlit` | App UI ๐ | Tabs like โPDF Process ๐โ and โMD Gallery ๐โ | ๐ฌ App starโlights, action! |
|
89 |
+
| `pandas` | Data crunching ๐ | Ready for OCR/metadata tables | ๐ Table tamer awaits! |
|
90 |
+
| `torch` | ML engine ๐ฅ | Powers `transformers` and `diffusers` | ๐ฅ AIโs fiery heart! |
|
91 |
+
| `requests` | Web grabber ๐ | Downloads PDFs in `download_pdf` | ๐ Web loot collector! |
|
92 |
+
| `aiofiles` | Fast file ops โก | Async writes in `process_ocr` | โ๏ธ File speed demon! |
|
93 |
+
| `pillow` | Image magic ๐๏ธ | PDF to image in `process_pdf_snapshot` | ๐ผ๏ธ Pixel Picasso! |
|
94 |
+
| `PyMuPDF` | PDF handler ๐ | Snapshots in `process_pdf_snapshot` | ๐ PDF scroll master! |
|
95 |
+
| `transformers`| AI models ๐ฃ๏ธ | GOT-OCR2_0 in `process_ocr` | ๐ค Brain in a box! |
|
96 |
+
| `diffusers` | Image gen ๐จ | Stable Diffusion in `process_image_gen` | ๐จ Art generator supreme! |
|
97 |
+
| `openai` | GPT vision/text ๐ค | Image/text processing in GPT functions | ๐ All-seeing AI oracle! |
|
98 |
+
| `glob2` | File finder ๐ | Gallery files in `update_gallery` | ๐ต๏ธ File sleuth! |
|
99 |
+
| `pytz` | Time zones โฐ | Timestamps in `generate_filename` | โณ Time wizard! |
|
100 |
+
|
101 |
+
## Automation Instructions: Witty & Funny Steps ๐
|
102 |
+
|
103 |
+
1. **Load PDFs** ๐
|
104 |
+
- Drop URLs into โDownload PDFs ๐ฅโ or upload files.
|
105 |
+
- *Emoji Tip*: ๐ฆ Unleash the PDF beastโroar through arXiv!
|
106 |
+
|
107 |
+
2. **Double-Page Snap** ๐ธ
|
108 |
+
- Click โSnapshot Selected ๐ธโ with โTwo Pages (High-Res)โโlandscape glory!
|
109 |
+
- *Witty Note*: Two pages > one, because who reads half a comic? ๐ฆธ
|
110 |
+
|
111 |
+
3. **GPT Vision Zap** โก
|
112 |
+
- In โPDF Process ๐โ, pick a GPT model (e.g., `gpt-4o-mini`) and zap text out.
|
113 |
+
- *Funny Bit*: GPTโs like โI see text, mortals!โ ๐๏ธ
|
114 |
+
|
115 |
+
4. **Markdown Mash** ๐
|
116 |
+
- โMD Gallery ๐โ takes Markdown files, smashes them into a 12-point emoji outline.
|
117 |
+
- *Sassy Tip*: 12 pointsโbecause 11โs weak and 13โs overkill! ๐
|
118 |
+
|
119 |
+
## Innovative Features ๐
|
120 |
+
|
121 |
+
- **Double-Page Spreads**: High-res, landscape images from PDFsโperfect for apps! ๐ฅ๏ธ
|
122 |
+
- **GPT Model Picker**: Swap `gpt-4o` for `gpt-4o-mini`โspeed vs. smarts! โก๐ง
|
123 |
+
- **12-Point Emoji Outline**: Clusters facts into 12 witty sectionsโe.g., โ1. Heroes ๐ฆธโ, โ2. Tech ๐งโ. ๐
|
124 |
+
|
125 |
+
## Mermaid Process Flow ๐งโโ๏ธ
|
126 |
+
|
127 |
+
```mermaid
|
128 |
+
graph TD
|
129 |
+
A[๐ PDFs] -->|๐ฅ Download| B[๐ PDF Process]
|
130 |
+
B -->|๐ธ Snapshot| C[๐ผ๏ธ Double-Page Images]
|
131 |
+
C -->|๐ค GPT Vision| D[๐ Markdown Files]
|
132 |
+
D -->|๐ MD Gallery| E[โ๏ธ 12-Point Emoji Outline]
|
133 |
+
|
134 |
+
A:::pdf
|
135 |
+
B:::process
|
136 |
+
C:::image
|
137 |
+
D:::markdown
|
138 |
+
E:::outline
|
139 |
+
|
140 |
+
classDef pdf fill:#f9f,stroke:#333,stroke-width:2px;
|
141 |
+
classDef process fill:#bbf,stroke:#333,stroke-width:2px;
|
142 |
+
classDef image fill:#bfb,stroke:#333,stroke-width:2px;
|
143 |
+
classDef markdown fill:#ffb,stroke:#333,stroke-width:2px;
|
144 |
+
classDef outline fill:#fbf,stroke:#333,stroke-width:2px;
|
145 |
+
```
|
146 |
+
|
147 |
+
|
148 |
+
Flow Explained:
|
149 |
+
1. ๐ PDFs: Start with one or more PDFs on a topic.
|
150 |
+
2. ๐ PDF Process: Download and snapshot into high-res double-page spreads.
|
151 |
+
3. ๐ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT.
|
152 |
+
4. ๐ Markdown Files: Text extracted per document, saved as Markdown.
|
153 |
+
5. โ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ1. Context ๐โ, โ2. Methods ๐ฌโ, ..., โ12. Future ๐โ).
|
154 |
+
Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโAI magic! โก
|
155 |
|
156 |
+
---
|
157 |
|
158 |
+
### Key Updates
|
159 |
+
1. **Tutorial Section**: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights.
|
160 |
+
2. **Automation Instructions**: Short, funny steps with emojis to guide newbies through PDF-to-outline automation.
|
161 |
+
3. **Innovative Features**: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features.
|
162 |
+
4. **Mermaid Diagram**: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes.
|
163 |
+
5. **Updated arXiv Links**: Refreshed to match current functionality (vision, OCR, GPT, diffusion):
|
164 |
+
- Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers.
|
165 |
+
- Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance.
|
166 |
+
|
167 |
+
### How to Use
|
168 |
+
- Save this as `README.md` in your project folder.
|
169 |
+
- View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered.
|
170 |
+
- Follow the automation steps to process PDFs and generate outlinesโperfect for learners exploring AI vision and text summarization!
|
171 |
+
|
172 |
+
This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|