File size: 6,468 Bytes
91efc1c
 
 
 
 
 
 
fca89aa
91efc1c
 
8bf4ef4
 
 
 
94c7f89
 
8bf4ef4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a9bb38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
title: Text2svg Demo App
emoji: πŸš€
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: false
app_port: 8501
---

# Drawing with LLM 🎨

A Streamlit application that converts text descriptions into SVG graphics using multiple AI models.

Access demo app by this [link](https://huggingface.co/spaces/Timxjl/text2svg-demo-app)

## Overview

This project allows users to create vector graphics (SVG) from text descriptions using three different approaches:
1. **ML Model** - Uses Stable Diffusion to generate images and vtracer to convert them to SVG
2. **DL Model** - Uses Stable Diffusion for initial image creation and StarVector for direct image-to-SVG conversion
3. **Naive Model** - Uses Phi-4 LLM to directly generate SVG code from text descriptions

## Features

- Text-to-SVG generation with three different model approaches
- Adjustable parameters for each model type
- Real-time SVG preview and code display
- SVG download functionality
- GPU acceleration for faster generation

## Requirements

- Python 3.11+
- CUDA-compatible GPU (recommended)
- Dependencies listed in `requirements.txt`

## Installation

### Using Miniconda (Recommended)

```bash
# Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
echo 'export PATH="$HOME/miniconda/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

# Create and activate environment
conda create -n svg-app python=3.11 -y
conda activate svg-app

# Install star-vector
cd star-vector 
pip install -e .
cd ..

# Install other dependencies
pip install -r requirements.txt
```

### Using Docker

```bash
# Build and run with Docker Compose
docker-compose up -d
```

## Usage

Start the Streamlit application:

```bash
streamlit run app.py
```

Or with the yes flag to automatically accept:

```bash
yes | streamlit run app.py
```

The application will be available at http://localhost:8501

## Models

### ML Model (vtracer)
Uses Stable Diffusion to generate an image from the text prompt, then applies vtracer to convert the raster image to SVG.

Configurable parameters:
- Simplify SVG
- Color Precision
- Filter Speckle
- Path Precision

### DL Model (starvector)
Uses Stable Diffusion for initial image creation followed by StarVector, a specialized model designed to convert images directly to SVG.

### Naive Model (phi-4)
Directly generates SVG code using the Phi-4 language model with specialized prompting.

Configurable parameters:
- Max New Tokens

## Evaluation Data and Results

### Data
The `data` directory contains synthetic evaluation data created using custom scripts:
- The first 15 examples are from the Kaggle competition "Drawing with LLM"
- `descriptions.csv` - Text descriptions for generating SVGs
- `eval.csv` - Evaluation metrics
- `gen_descriptions.py` - Script for generating synthetic descriptions
- `gen_vqa.py` - Script for generating visual question answering data
- Sample images (`gray_coat.png`, `purple_forest.png`) for reference

### Results
The `results` directory contains evaluation results comparing different models:
- Evaluation results for both Naive (Phi-4) and ML (vtracer) models
- The DL model (StarVector) was not evaluated as it typically fails on transforming natural images, often returning blank SVGs
- Performance visualizations:
  - `category_radar.png` - Performance comparison across categories
  - `complexity_performance.png` - Performance relative to prompt complexity
  - `quality_vs_time.png` - Quality-time tradeoff analysis
  - `generation_time.png` - Comparison of generation times
  - `model_comparison.png` - Overall model performance comparison
- Generated SVGs and PNGs in respective subdirectories
- Detailed results in JSON and CSV formats

## Project Structure

```
drawing-with-llm/             # Root directory
β”‚
β”œβ”€β”€ app.py                    # Main Streamlit application
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ Dockerfile                # Docker container definition
β”œβ”€β”€ docker-compose.yml        # Docker Compose configuration
β”‚
β”œβ”€β”€ ml.py                     # ML model implementation (vtracer approach)
β”œβ”€β”€ dl.py                     # DL model implementation (StarVector approach)
β”œβ”€β”€ naive.py                  # Naive model implementation (Phi-4 approach)
β”œβ”€β”€ gen_image.py              # Common image generation using Stable Diffusion
β”‚
β”œβ”€β”€ eval.py                   # Evaluation script for model comparison
β”œβ”€β”€ eval_analysis.py          # Analysis script for evaluation results
β”œβ”€β”€ metric.py                 # Metrics implementation for evaluation
β”‚
β”œβ”€β”€ data/                     # Evaluation data directory
β”‚   β”œβ”€β”€ descriptions.csv      # Text descriptions for evaluation
β”‚   β”œβ”€β”€ eval.csv              # Evaluation metrics
β”‚   β”œβ”€β”€ gen_descriptions.py   # Script for generating synthetic descriptions
β”‚   β”œβ”€β”€ gen_vqa.py            # Script for generating VQA data
β”‚   β”œβ”€β”€ gray_coat.png         # Sample image by GPT-4o
β”‚   └── purple_forest.png     # Sample image by GPT-4o
β”‚
β”œβ”€β”€ results/                  # Evaluation results directory
β”‚   β”œβ”€β”€ category_radar.png    # Performance comparison across categories
β”‚   β”œβ”€β”€ complexity_performance.png # Performance by prompt complexity
β”‚   β”œβ”€β”€ quality_vs_time.png   # Quality-time tradeoff analysis
β”‚   β”œβ”€β”€ generation_time.png   # Comparison of generation times
β”‚   β”œβ”€β”€ model_comparison.png  # Overall model performance comparison
β”‚   β”œβ”€β”€ summary_*.csv         # Summary metrics in CSV format
β”‚   β”œβ”€β”€ results_*.json        # Detailed results in JSON format
β”‚   β”œβ”€β”€ svg/                  # Generated SVG outputs
β”‚   └── png/                  # Generated PNG outputs
β”‚
β”œβ”€β”€ star-vector/              # StarVector dependency (installed locally)
└── starvector/               # StarVector Python package
```

## Acknowledgments

This project utilizes several key technologies:
- [Stable Diffusion](https://github.com/CompVis/stable-diffusion) for image generation
- [StarVector](https://github.com/joanrod/star-vector) for image-to-SVG conversion
- [vtracer](https://github.com/visioncortex/vtracer) for raster-to-vector conversion
- [Phi-4](https://huggingface.co/microsoft/phi-4) for text-to-SVG generation
- [Streamlit](https://streamlit.io/) for the web interface