Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,27 @@ pinned: false
|
|
10 |
license: mit
|
11 |
short_description: Torch Transformers Diffusion SFT for Computer Vision
|
12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## Abstract
|
14 |
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:
|
15 |
|
|
|
10 |
license: mit
|
11 |
short_description: Torch Transformers Diffusion SFT for Computer Vision
|
12 |
---
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
## Abstract
|
17 |
+
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
|
18 |
+
|
19 |
+
- 🌐 **[Streamlit](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI.
|
20 |
+
- 🔥 **[PyTorch](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Core.
|
21 |
+
- 🔍 **[Qwen2-VL](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Multimodal OCR.
|
22 |
+
- 🔍 **[TrOCR](https://arxiv.org/abs/2109.10282)** - Li et al., 2021: Small OCR.
|
23 |
+
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image gen.
|
24 |
+
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
|
25 |
+
|
26 |
+
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, test, innovate! ${emoji}
|
27 |
+
|
28 |
+
## Usage 🎯
|
29 |
+
- 📷 **Camera Snap**: Single or burst capture (auto 10 frames) with gallery.
|
30 |
+
- 🔍 **Test OCR**: `Qwen2-VL-OCR-2B` or `TrOCR-Small` extracts text, saved async.
|
31 |
+
- 🎨 **Test Image Gen**: `OFA-Sys/small-stable-diffusion-v0` generates images, saved async.
|
32 |
+
- ✏️ **Test Line Drawings**: OpenCV line art (Torch Space-inspired), saved async.
|
33 |
+
|
34 |
## Abstract
|
35 |
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:
|
36 |
|