Special Thanks: A special thanks to NationTech.io for their generous sponsorship and for providing the main address parsing dataset used in this compilation, and to Cherry Republic for sponsoring the training.

Supported Tasks and Leaderboards

Main Function:

Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates.

Sub Functions:

Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.

Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.

Domain Categorization: Extracting domain information and document types.

And more!

Languages:

The dataset primarily contains text in English but includes other languages due to the diversity of sources.

Dataset Structure

Data Instances

Each data instance consists of three main components:

System Message: Instructions provided to the assistant (model) for the task.

User Input: The textual content containing addresses or entities to be parsed.

Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.

Example:

image/png

image/png

image/png

Downloads last month
17
Safetensors
Model size
1.1B params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .