Ingredient Scanner

Abstract

With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.

The training code and data is available on GitHub. This repository just contains an inference example and the report.

This is an entry for the 2024 Swiss AI competition.

Table of Contents

  1. Abstract
  2. Report
  3. Model Details
  4. Usage
  5. Citation

Report

Read the full report here.

Model Details

This repository consists of two models, one vision model and a large language model.

Vision Model

Custom convolutional neural network based on ResNet18. It detects the four corner points and the upper and lower limits of a product.

Language Model

Converts the text from the optical character recognition engine which lies in-between the two models to JSON. It is fine-tuned from unsloth/Qwen2-0.5B-Instruct-bnb-4bit.

Usage

Clone the repository and install the dependencies on any debian-based system:

git clone https://huggingface.co/lenamerkli/ingredient-scanner
cd ingredient-scanner
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Note: not all requirements are needed for inference, as both training and inference requirements are listed.

Select the OCR engine in main.py by uncommenting one of the lines 20 to 22:

# ENGINE: list[str] = ['easyocr']
# ENGINE: list[str] = ['anthropic', 'claude-3-5-sonnet-20240620']
# ENGINE: list[str] = ['llama_cpp/v2/vision', 'qwen-vl-next_b2583']

Note: Qwen-VL-Next is not an official qwen model. This is only to protect business secrets of a private model.

Run the inference script:

python3 main.py

You will be asked to enter the file path to a PNG image.

Anthropic API

If you want to use the Anthropic API, create a .env file with the following content:

ANTHROPIC_API_KEY=YOUR_API_KEY

Citation

Here is how to cite this paper in the bibtex format:

@misc{merkli2024ingriedient-scanner,
    title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
    author={Lena Merkli and Sonja Merkli},
    date={2024-07-16},
    url={https://huggingface.co/lenamerkli/ingredient-scanner},
}
Downloads last month
7
GGUF
Model size
494M params
Architecture
qwen2

4-bit

Inference Examples
Unable to determine this model's library. Check the docs .