Flova
/

omr_transformer

vision-encoder-decoder

image-text-to-text

Model card Files Files and versions Community

omr_transformer / README.md

Flova's picture

Update README.md

006c015 over 2 years ago

|

history blame contribute delete

999 Bytes

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-to-text
	---

	# Optical Music Recognition Transformer

	<!-- Provide a quick summary of what the model is/does. [Optional] -->
	Image-To-Text model for optical music recognition.
	The model is trained to predict simple notes in the [LilyPond](https://en.wikipedia.org/wiki/LilyPond) format from a given image.
	Training data consists of artificial, handwritten and white board images.
	The model itself is based on [Donut](https://huggingface.co/docs/transformers/model_doc/donut).

	## Demo

	![White Board Sample](sample1.png)

	Prediction: `c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8`


	![White Board Sample](sample2.png)

	Prediction: `d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2`

	![Handwritten White Board Sample](sample3.png)

	Prediction: `g'4 c'4 r8 f''8 e'8 d'8 r8 c'4 c'2 a'2 b'4 r4 a'8 r8 r4 `


	Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision