tools & ideas for the 21st century

AI and software R&D for the humanities and social sciences.

A small studio working with research teams, archives, cultural institutions, and newsrooms across Europe. We build pipelines end to end, run hands-on workshops, and advise on the right digital methods for the work in front of you.

Say hello →

a b o u t

A small studio that takes the material seriously.

toolbox21 is a small studio based in Greece. Most engagements are research and development: experimenting with what current AI and language-processing methods can do for specialist material, and building the bespoke tools that make them usable in an actual research workflow.

We also write the technical sections of funding proposals, join consortia as a research partner, and consult on the right digital methods for a given project. We take on a handful of engagements at a time.

Beyond the studio, we work with a trusted network of external collaborators assembled to fit each engagement.

Director Maria Dermentzi founded toolbox21 after several years developing digital tools and research software at King's College London and UVSQ Paris-Saclay. Earlier in her career she worked as a video producer at Mashable in London and ran coding workshops through non-profits she co-founded.

MSc Digital Humanities (KU Leuven) · MA Digital Culture & Society (King's College London) · LLB (Aristotle University of Thessaloniki) · Columbia Journalism Video Workshop

p r a c t i c e

What we work on.

Digital Humanities

Tools and pipelines for historical, literary, and cultural research.

Natural Language Processing

Information extraction and language models applied to domain-specific text.

Legal Tech

Tools for searching, comparing, and extracting structure from case law, statutes, and other legal texts.

AI Law

Practical advice on the rules shaping how AI is built and used.

AI Ethics

Ethics work for any system that uses AI.

Journalism tools

Search and transcription work for newsrooms and investigative teams.

w o r k s h o p s

Hands-on workshops for newcomers to digital methods.

One thing we particularly enjoy: hands-on workshops for researchers, archivists, librarians, and curators coming from the humanities and social sciences who want to bring digital methods into their work — without first becoming computer scientists.

Sessions are paced for people who are new to programming. The examples come from representative material, and you leave with notes and code you can use on your own corpus the next day.

Topics we cover

NLP for humanities text

Named entity recognition & entity linking

Working with oral testimonies (ASR + transcript analysis)

Corpus linguistics methods

Introduction to AI ethics

Introduction to the EU AI Act

Legal & ethical impact assessments

Fundamental-rights impact assessments

Gender-based impact assessments

Data protection (law)

w o r k

Selected work.

A selection of recent projects.

Research infrastructure node

EHRI Annotator

A web-based tool for multilingual named entity recognition and entity linking in Holocaust-related texts. Built to streamline how editors of EHRI Online Editions detect named entities — people, organisations, locations, camps, ghettos — in documents and link them to unique identifiers in EHRI's controlled vocabularies and to GeoNames.

Uses a dual linking architecture: dense semantic matching (LaBSE embeddings indexed in Qdrant) for the modest EHRI vocabularies, and string-matching with domain-specific relevance weighting for the much larger GeoNames gazetteer. Exports TEI P5 XML ready for the publication pipeline.

Built on open-source models (a fine-tuned XLM-RoBERTa for NER, LaBSE for multilingual matching) and self-hostable infrastructure (Qdrant for the vector store) — designed to run inside the institution's own environment once deployment moves to EHRI's servers.

NER · entity linking · LaBSE · Qdrant · TEI P5

EHRI webinar — using AI for new Online Editions →

Federated metadata

OAI-PMH endpoint for the EHRI Portal

Built a standards-compliant OAI-PMH endpoint that exposes the EHRI Portal's data — metadata aggregated from Holocaust-era archives around the world, together with the controlled vocabularies and country-level context that go with it — to a national research data infrastructure, so the federated material can be harvested and discovered alongside other holdings.

OAI-PMH · metadata harmonisation · federation

Training · research infrastructure node

NER & entity linking training for a national Holocaust-research node

Designed and delivered a hands-on training programme on named entity recognition and entity linking for analysts at a national node of a European Holocaust research infrastructure — whose members include a national documentation centre and a public victims' fund. The course focused on finding and linking names of people across historical records and victims datasets, translating state-of-the-art NLP techniques into a workflow the participating institutions can apply themselves.

NER · entity linking · workshop curriculum

Research · oral history

Speech recognition and analysis on oral history archives and interviews

Extensive work with oral history archives and interview material — applying automatic speech recognition across multiple languages, then using corpus linguistics and NLP to analyse the transcripts. The pipeline has been the subject of hands-on workshops for several different audiences.

ASR · corpus linguistics · oral history · workshops

Get in touch.

Drop a line. We'll get back to you.

info@toolbox21.com →