|
--- |
|
title: ๐ทTorch ๐Transformers ๐ผ๏ธCV ๐ง SFT |
|
emoji: ๐ท๐๐ง |
|
colorFrom: yellow |
|
colorTo: indigo |
|
sdk: streamlit |
|
sdk_version: 1.44.1 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
short_description: ๐ทTorch ๐Transformers ๐ผ๏ธCV ๐ง SFT |
|
--- |
|
|
|
# Features: |
|
1. Camera Snap ๐ท |
|
2. Test OCR ๐ |
|
3. MD Gallery ๐ |
|
4. Download PDFs ๐ฅ |
|
5. Build Titan ๐ฑ |
|
6. Test Image Gen ๐จ |
|
7. PDF Process ๐ |
|
8. Image Process ๐ผ๏ธ |
|
9. Character Editor ๐งโ๐จ |
|
10. Character Gallery ๐ผ๏ธ |
|
|
|
## Tutorial: Single to Double Page Emoji Outlines |
|
|
|
### Single Page Outline: Key Functions in `app.py` |
|
|
|
| **Function** | **Purpose** ๐ฏ | **How It Works** ๐ ๏ธ | **Emoji Insight** ๐ | |
|
|----------------------------|---------------------------------------------|--------------------------------------------------|-------------------------------| |
|
| `generate_filename` | Unique file names ๐
| Adds timestamp to sequence | ๐ฐ๏ธ Timeโs your file buddy! | |
|
| `pdf_url_to_filename` | Safe PDF names ๐๏ธ | Cleans URLs to underscores | ๐ซ No URL mess! | |
|
| `get_download_link` | Downloadable files โฌ๏ธ | Base64-encodes for HTML links | ๐ฆ Grab it, go! | |
|
| `download_pdf` | Web PDF snatcher ๐ | Fetches PDFs with `requests` | ๐ PDF pirate ahoy! | |
|
| `process_pdf_snapshot` | PDF to images ๐ผ๏ธ | Async snapshots (single/double/all) with `fitz` | ๐ธ Double-page dazzle! | |
|
| `process_ocr` | Image text extractor ๐ | Async GOT-OCR2_0 with `transformers` | ๐ Text ninja strikes! | |
|
| `process_image_gen` | Prompt to image ๐จ | Async Stable Diffusion with `diffusers` | ๐๏ธ Art from wordsโbam! | |
|
| `process_image_with_prompt`| GPT image analysis ๐ค | Base64 to GPT vision | ๐ง GPT sees all! | |
|
| `process_text_with_prompt` | GPT text summarizer โ๏ธ | Text to GPT for outlining | ๐ Summarize like a pro! | |
|
| `update_gallery` | File showcase ๐ผ๏ธ๐ | Sidebar display with delete options | ๐ Your creations shine! | |
|
|
|
### Double Page Outline: Libraries in `requirements.txt` |
|
|
|
| **Library** | **Single Page Purpose** ๐ฏ | **Double Page Usage** ๐ ๏ธ | **Emoji Insight** ๐ | |
|
|---------------|-------------------------------------------|----------------------------------------------------|-------------------------------| |
|
| `streamlit` | App UI ๐ | Tabs like โPDF Process ๐โ and โMD Gallery ๐โ | ๐ฌ App starโlights, action! | |
|
| `pandas` | Data crunching ๐ | Ready for OCR/metadata tables | ๐ Table tamer awaits! | |
|
| `torch` | ML engine ๐ฅ | Powers `transformers` and `diffusers` | ๐ฅ AIโs fiery heart! | |
|
| `requests` | Web grabber ๐ | Downloads PDFs in `download_pdf` | ๐ Web loot collector! | |
|
| `aiofiles` | Fast file ops โก | Async writes in `process_ocr` | โ๏ธ File speed demon! | |
|
| `pillow` | Image magic ๐๏ธ | PDF to image in `process_pdf_snapshot` | ๐ผ๏ธ Pixel Picasso! | |
|
| `PyMuPDF` | PDF handler ๐ | Snapshots in `process_pdf_snapshot` | ๐ PDF scroll master! | |
|
| `transformers`| AI models ๐ฃ๏ธ | GOT-OCR2_0 in `process_ocr` | ๐ค Brain in a box! | |
|
| `diffusers` | Image gen ๐จ | Stable Diffusion in `process_image_gen` | ๐จ Art generator supreme! | |
|
| `openai` | GPT vision/text ๐ค | Image/text processing in GPT functions | ๐ All-seeing AI oracle! | |
|
| `glob2` | File finder ๐ | Gallery files in `update_gallery` | ๐ต๏ธ File sleuth! | |
|
| `pytz` | Time zones โฐ | Timestamps in `generate_filename` | โณ Time wizard! | |
|
|
|
|
|
# TorchTransformers Diffusion CV SFT Titans ๐ |
|
|
|
A Streamlit app blending `torch`, `transformers`, and `diffusers` for vision and NLP fun! Snap PDFs ๐, turn them into double-page spreads ๐ผ๏ธ, extract text with GPT ๐ค, and craft emoji-packed Markdown outlines ๐โall with a witty UI and CPU-friendly SFT. |
|
|
|
## Integration Details |
|
|
|
1. **SFT Tiny Titans (First Listing)**: |
|
- Features: Causal LM and Diffusion SFT, camera snap, RAG party. |
|
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved `ModelBuilder` and `DiffusionBuilder` with SFT functionality. |
|
2. **SFT Tiny Titans (Second Listing)**: |
|
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo. |
|
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). |
|
3. **AI Vision Titans (Current)**: |
|
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction. |
|
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates. |
|
4. **Sidebar, Session, and History**: |
|
- Unified gallery shows PNGs, PDFs, and MD files from all tabs. |
|
- Session state (`captured_files`, `builder`, `model_loaded`, `processing`, `history`) tracks all operations. |
|
- History log in sidebar records key actions (snapshots, SFT, tests). |
|
5. **Workflow**: |
|
- Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโall saved in the gallery. |
|
6. **Verification**: |
|
- Run: `streamlit run app.py` |
|
- Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery. |
|
7. **Notes**: |
|
- PDF URLs need direct links (e.g., arXivโs `/pdf/` path). |
|
- CPU defaults with CUDA fallback for broad compatibility. |
|
|
|
## Abstract |
|
Fuse `torch`, `transformers`, and `diffusers` with GPT vision for a wild AI ride! Dual `st.camera_input` ๐ท and PDF downloads ๐ feed a gallery, powering GOT-OCR2_0 ๐, Stable Diffusion ๐จ, and GPT text extraction ๐ค. Key papers: |
|
|
|
- ๐ **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic. |
|
- ๐ฅ **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core. |
|
- ๐ง **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers. |
|
- ๐จ **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion basics. |
|
- ๐ **[GOT: General OCR Theory](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Advanced OCR. |
|
- ๐จ **[Latent Diffusion Models](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image generation. |
|
- โ๏ธ **[LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency. |
|
- ๐ **[RAG: Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations. |
|
- ๐๏ธ **[Vision Transformers](https://arxiv.org/abs/2010.11929)** - Dosovitskiy et al., 2020: Vision backbone. |
|
- ๐ **[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)** - OpenAI, 2023: GPT power. |
|
- ๐ผ๏ธ **[CLIP: Learning Transferable Visual Models](https://arxiv.org/abs/2103.00020)** - Radford et al., 2021: Vision-language bridge. |
|
- โฐ **[Time Zone Handling in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: `pytz` context. |
|
|
|
Run: `pip install -r requirements.txt`, `streamlit run app.py`. Snap, process, summarize! โก |
|
|
|
## Usage ๐ฏ |
|
- ๐ท **Camera Snap**: Capture pics with dual cams. |
|
- ๐ฅ **Download PDFs**: Fetch papers (e.g., arXiv links below). |
|
- ๐ **PDF Process**: Snapshot to double-page spreads, extract text with GPT. |
|
- ๐ผ๏ธ **Image Process**: OCR images with GPT vision. |
|
- ๐ **MD Gallery**: Summarize Markdown files into emoji outlines. |
|
|
|
## Automation Instructions: Witty & Funny Steps ๐ |
|
|
|
1. **Load PDFs** ๐ |
|
- Drop URLs into โDownload PDFs ๐ฅโ or upload files. |
|
- *Emoji Tip*: ๐ฆ Unleash the PDF beastโroar through arXiv! |
|
|
|
2. **Double-Page Snap** ๐ธ |
|
- Click โSnapshot Selected ๐ธโ with โTwo Pages (High-Res)โโlandscape glory! |
|
- *Witty Note*: Two pages > one, because who reads half a comic? ๐ฆธ |
|
|
|
3. **GPT Vision Zap** โก |
|
- In โPDF Process ๐โ, pick a GPT model (e.g., `gpt-4o-mini`) and zap text out. |
|
- *Funny Bit*: GPTโs like โI see text, mortals!โ ๐๏ธ |
|
|
|
4. **Markdown Mash** ๐ |
|
- โMD Gallery ๐โ takes Markdown files, smashes them into a 12-point emoji outline. |
|
- *Sassy Tip*: 12 pointsโbecause 11โs weak and 13โs overkill! ๐ |
|
|
|
## Innovative Features ๐ |
|
|
|
- **Double-Page Spreads**: High-res, landscape images from PDFsโperfect for apps! ๐ฅ๏ธ |
|
- **GPT Model Picker**: Swap `gpt-4o` for `gpt-4o-mini`โspeed vs. smarts! โก๐ง |
|
- **12-Point Emoji Outline**: Clusters facts into 12 witty sectionsโe.g., โ1. Heroes ๐ฆธโ, โ2. Tech ๐งโ. ๐ |
|
|
|
## Mermaid Process Flow ๐งโโ๏ธ |
|
|
|
```mermaid |
|
graph TD |
|
A[๐ PDFs] -->|๐ฅ Download| B[๐ PDF Process] |
|
B -->|๐ธ Snapshot| C[๐ผ๏ธ Double-Page Images] |
|
C -->|๐ค GPT Vision| D[๐ Markdown Files] |
|
D -->|๐ MD Gallery| E[โ๏ธ 12-Point Emoji Outline] |
|
|
|
A:::pdf |
|
B:::process |
|
C:::image |
|
D:::markdown |
|
E:::outline |
|
|
|
classDef pdf fill:#f9f,stroke:#333,stroke-width:2px; |
|
classDef process fill:#bbf,stroke:#333,stroke-width:2px; |
|
classDef image fill:#bfb,stroke:#333,stroke-width:2px; |
|
classDef markdown fill:#ffb,stroke:#333,stroke-width:2px; |
|
classDef outline fill:#fbf,stroke:#333,stroke-width:2px; |
|
``` |
|
|
|
|
|
Flow Explained: |
|
1. ๐ PDFs: Start with one or more PDFs on a topic. |
|
2. ๐ PDF Process: Download and snapshot into high-res double-page spreads. |
|
3. ๐ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT. |
|
4. ๐ Markdown Files: Text extracted per document, saved as Markdown. |
|
5. โ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ1. Context ๐โ, โ2. Methods ๐ฌโ, ..., โ12. Future ๐โ). |
|
Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโAI magic! โก |
|
|
|
--- |
|
|
|
### Key Updates |
|
1. **Tutorial Section**: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights. |
|
2. **Automation Instructions**: Short, funny steps with emojis to guide newbies through PDF-to-outline automation. |
|
3. **Innovative Features**: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features. |
|
4. **Mermaid Diagram**: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes. |
|
5. **Updated arXiv Links**: Refreshed to match current functionality (vision, OCR, GPT, diffusion): |
|
- Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers. |
|
- Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance. |
|
|
|
### How to Use |
|
- Save this as `README.md` in your project folder. |
|
- View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered. |
|
- Follow the automation steps to process PDFs and generate outlinesโperfect for learners exploring AI vision and text summarization! |
|
|
|
This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐ |