EHRI Annotator
A web-based tool for multilingual named entity recognition and entity linking in Holocaust-related texts. Built to streamline how editors of EHRI Online Editions detect named entities — people, organisations, locations, camps, ghettos — in documents and link them to unique identifiers in EHRI's controlled vocabularies and to GeoNames.
Uses a dual linking architecture: dense semantic matching (LaBSE embeddings indexed in Qdrant) for the modest EHRI vocabularies, and string-matching with domain-specific relevance weighting for the much larger GeoNames gazetteer. Exports TEI P5 XML ready for the publication pipeline.
Built on open-source models (a fine-tuned XLM-RoBERTa for NER, LaBSE for multilingual matching) and self-hostable infrastructure (Qdrant for the vector store) — designed to run inside the institution's own environment once deployment moves to EHRI's servers.
EHRI webinar — using AI for new Online Editions →