FemkeBakker/AmsterdamDocClassificationMistral200T1Epochs
Text Generation
•
Updated
•
23
Collection of fine-tuned LLMs and datasets used in a project of the Municipality of Amsterdam to classify Dutch documents.
Note Dataset used to fine-tune the models. The documents are already shortened and data is formatted into conversations, using the zero-shot prompt. It's ready to use for training.
Note The dataset includes the full text of the documents, labels, num_pages and data split (train, test, val, discard).