Special Thanks: A special thanks to NationTech.io for their generous sponsorship and for providing the main address parsing dataset used in this compilation, and to Cherry Republic for sponsoring the training.
Supported Tasks and Leaderboards
Main Function:
Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates.
Sub Functions:
Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.
Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.
Domain Categorization: Extracting domain information and document types.
And more!
Languages:
The dataset primarily contains text in English but includes other languages due to the diversity of sources.
Dataset Structure
Data Instances
Each data instance consists of three main components:
System Message: Instructions provided to the assistant (model) for the task.
User Input: The textual content containing addresses or entities to be parsed.
Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.
Example:
- Downloads last month
- 17