gregg-vision-v0.2.1 / README.md
chanicpanic's picture
Update README.md
72de723 verified
---
library_name: transformers
license: mit
datasets:
- grascii/gregg-preanniversary-words
pipeline_tag: image-to-text
tags:
- gregg
- shorthand
- stenography
---
# Gregg Vision v0.2.1
Gregg Vision v0.2.1 generates a [Grascii](https://github.com/grascii/grascii) representation of a Gregg Shorthand form.
- **Model type:** Vision Encoder Text Decoder
- **License:** MIT
- **Repository:** [Github](https://github.com/grascii/gregg-vision-v0.2.1)
- **Demo:** [Grascii Search Space](https://huggingface.co/spaces/grascii/search)
## Uses
Given a grayscale image of a single shorthand form, Gregg Vision can be used to
generate its Grascii representation. When combined with [Grascii Search](https://github.com/grascii/grascii),
one can obtain possible English interpretations of the shorthand form.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer
from PIL import Image
import numpy as np
model_id = "grascii/gregg-vision-v0.2.1"
model = AutoModelForVision2Seq.from_pretrained(model_id)
processor = AutoImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
def generate_grascii(image: Image):
# convert image to a single channel
grayscale = image.convert("L")
# prepare processor input
images = np.array([grayscale])
# preprocess image
pixel_values = processor(images, return_tensors="pt").pixel_values
# generate token ids
ids = model.generate(pixel_values, max_new_tokens=12)[0]
# decode ids and return grascii
return tokenizer.decode(ids, skip_special_tokens=True)
```
Note: As of `transformers` v4.47.0, the model is incompatible with `pipeline` due to the
model's single channel image input.
## Technical Details
### Model Architecture and Objective
Gregg Vision v0.2.1 is a transformer model with a ViT encoder and a Roberta decoder.
For training, the model was warm-started using
[vit-small-patch16-224-single-channel](https://huggingface.co/grascii/vit-small-patch16-224-single-channel)
for the encoder and a randomly initialized Roberta network for the decoder.
### Training Data
Gregg Vision v0.2.1 was trained on the [gregg-preanniversary-words](https://huggingface.co/datasets/grascii/gregg-preanniversary-words) dataset.
### Training Hardware
Gregg Vision v0.2.1 was trained using 1xT4.