Josephgflowers/Address-Parser-Tinyllama-v1

Special Thanks: A special thanks to NationTech.io and to Cherry Republic for sponsoring the work.

Supported Tasks and Leaderboards

Main Function:

Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates.

Sub Functions:

Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.

Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.

Domain Categorization: Extracting domain information and document types.

And more!

Languages:

The dataset primarily contains text in English but includes other languages due to the diversity of sources.

Dataset Structure

Data Instances

Each data instance consists of three main components:

System Message: Instructions provided to the assistant (model) for the task.

User Input: The textual content containing addresses or entities to be parsed.

Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.

Example: