Skip to content
@impresso

Media Monitoring of the Past

Media Monitoring of the Past - Beyond Borders: Connecting Historical Newspapers and Radio.

Impresso Project Logo

About

Hi there 👋 !

Impresso - Media Monitoring of the Past is an interdisciplinary research project that uses machine learning to pursue a paradigm shift in the processing, semantic enrichment, representation, exploration and study of historical media across modalities, temporal, linguistic, and national borders. The project has received two rounds of funding, from 2017-2020 and 2023-2027 (hence, there is code from both periods).

We design and develop the Impresso Web App and the upcoming Impresso Datalab (coming soon), while conducting research at the intersection of Natural Language Processing, Design, and History. Find more details on the project website.

Contents

This GitHub organization hosts numerous repositories dedicated to:

  • the code behind the Web App and Datalab. While a few repositories are public, many are still private. We aim to document and release code properly as it matures and becomes ready;
  • code supporting research efforts;
  • code from student projects.

More information and highlights will be shared as we continue to make progress! In addition to the public repositories listed below, you can also check out our models on the Impresso Hugging Face organisation.

Impresso 2 release history

(to come)

Popular repositories Loading

  1. named-entity-tutorial-dh2019 named-entity-tutorial-dh2019 Public

    Tutorial on NE processing for Digital Humanities - DH Utrech 2019

    Jupyter Notebook 24 4

  2. CLEF-HIPE-2020 CLEF-HIPE-2020 Public

    Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at CLEF 2020.

    SCSS 21 5

  3. NZZ-black-letter-ground-truth NZZ-black-letter-ground-truth Public

    11 2

  4. impresso-text-acquisition impresso-text-acquisition Public

    🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

    Jupyter Notebook 9 3

  5. impresso-datalab-notebooks impresso-datalab-notebooks Public

    🔬 Impresso Datalab Notebooks

    Jupyter Notebook 9 3

  6. llm-transcript-postcorrection llm-transcript-postcorrection Public

    Work on OCR/ASR/HTR post-correction.

    Jupyter Notebook 8 1

Repositories

Showing 10 of 68 repositories
  • impresso-text-acquisition Public

    🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

    impresso/impresso-text-acquisition’s past year of commit activity
    Jupyter Notebook 9 AGPL-3.0 3 41 (1 issue needs help) 1 Updated Mar 16, 2026
  • impresso-middle-layer Public

    Middle layer API

    impresso/impresso-middle-layer’s past year of commit activity
    TypeScript 0 AGPL-3.0 1 24 11 Updated Mar 13, 2026
  • impresso-frontend Public

    🚀 The frontend application of the Impresso WebApp

    impresso/impresso-frontend’s past year of commit activity
    Vue 5 AGPL-3.0 1 101 1 Updated Mar 13, 2026
  • impresso-content-item-classification-cookbook Public

    Repository for reclassifying Impresso content items according to base types (e.g. journalistic content, advertizements)

    impresso/impresso-content-item-classification-cookbook’s past year of commit activity
    Python 0 0 0 0 Updated Mar 12, 2026
  • impresso-make-cookbook Public

    Repo for a make-based cookbook for (nlp) offline processing steps

    impresso/impresso-make-cookbook’s past year of commit activity
    Python 0 AGPL-3.0 1 0 0 Updated Mar 12, 2026
  • impresso-essentials Public

    ⚙️ Python package highly reusable modules and functions within impresso.

    impresso/impresso-essentials’s past year of commit activity
    Jupyter Notebook 0 GPL-3.0 1 7 2 Updated Mar 11, 2026
  • impresso-datalab Public

    Impresso Datalab static Astro website

    impresso/impresso-datalab’s past year of commit activity
    MDX 2 AGPL-3.0 0 19 0 Updated Mar 10, 2026
  • impresso-datalab-notebooks Public

    🔬 Impresso Datalab Notebooks

    impresso/impresso-datalab-notebooks’s past year of commit activity
    Jupyter Notebook 9 AGPL-3.0 3 28 3 Updated Mar 10, 2026
  • impresso-docker-stack Public

    Docker stack for impresso app

    impresso/impresso-docker-stack’s past year of commit activity
    Ruby 2 AGPL-3.0 0 2 1 Updated Mar 10, 2026
  • ocr-robust-multilingual-embeddings Public

    This repository provides datasets, adapted models, and starter code for the ACL 2025 paper "Cheap Character Noise for OCR-Robust Multilingual Embeddings." It supports research on multilingual embeddings that are robust to OCR noise. All resources are publicly available and open-source.

    impresso/ocr-robust-multilingual-embeddings’s past year of commit activity
    Python 4 AGPL-3.0 0 0 0 Updated Mar 9, 2026

Most used topics

Loading…