AI & ML interests

Historical Media Analysis and Enrichment

Recent Activity

Impresso - Media Monitoring of the Past is an interdisciplinary research project that uses machine learning to pursue a paradigm shift in the processing, semantic enrichment, representation, exploration and study of historical media across modalities, temporal, linguistic, and national borders. We develop the 🚀 Impresso Web App and the 🔬 Impresso Datalab (coming soon), providing search, exploratory analysis, and programmatic access to an unprecedented corpus of multilingual historical newspapers and radio broadcasts collections. Our work sits at the intersection of Natural Language Processing, Design, and History.

We share:

  • 🤖 Impresso models tailored for historical, multilingual documents and include language identification, OCR quality assessment, topic inference, NER and NEL.
  • 📚 Impresso datasets curated from digitized historical media sources, designed to support ML development and evaluation. Datasets are currently in preparation and will soon be released, including a NER and NEL benchmark developed as part of the HIPE evaluation campaign, an image type classification dataset, and more.

Impresso gratefully acknowledges the continued support of its cultural heritage 🏛️ partners as well as funding from the SNSF (Grant No. CRSII5_173719 and CRSII5_213585) and the FNR (Grant No. 17498891).