Papers
arxiv:1910.09313

Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Published on Oct 16, 2019
Authors:
,
,

Abstract

Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1910.09313 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1910.09313 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1910.09313 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.