tools & ideas for the 21st century

AI and software R&D for the humanities and social sciences.

A small studio working with research teams, archives, cultural institutions, and newsrooms across Europe. We build pipelines end to end, run hands-on workshops, and advise on the right digital methods for the work in front of you.

a b o u t

A small studio that takes the material seriously.

toolbox21 is a small studio based in Greece. Most engagements are research and development: experimenting with what current AI and language-processing methods can do for specialist material, and building the bespoke tools that make them usable in an actual research workflow.

We also write the technical sections of funding proposals, join consortia as a research partner, and consult on the right digital methods for a given project. We take on a handful of engagements at a time.

Beyond the studio, we work with a trusted network of external collaborators assembled to fit each engagement.

Director Maria Dermentzi founded toolbox21 after several years developing digital tools and research software at King's College London and UVSQ Paris-Saclay. Earlier in her career she worked as a video producer at Mashable in London and ran coding workshops through non-profits she co-founded.
MSc Digital Humanities (KU Leuven) · MA Digital Culture & Society (King's College London) · LLB (Aristotle University of Thessaloniki) · Columbia Journalism Video Workshop
p r a c t i c e

What we work on.

Digital Humanities
Tools and pipelines for historical, literary, and cultural research.
Natural Language Processing
Information extraction and language models applied to domain-specific text.
Legal Tech
Tools for searching, comparing, and extracting structure from case law, statutes, and other legal texts.
AI Law
Practical advice on the rules shaping how AI is built and used.
AI Ethics
Ethics work for any system that uses AI.
Journalism tools
Search and transcription work for newsrooms and investigative teams.
w o r k s h o p s

Hands-on workshops for newcomers to digital methods.

One thing we particularly enjoy: hands-on workshops for researchers, archivists, librarians, and curators coming from the humanities and social sciences who want to bring digital methods into their work — without first becoming computer scientists.

Sessions are paced for people who are new to programming. The examples come from representative material, and you leave with notes and code you can use on your own corpus the next day.

Topics we cover

NLP for humanities text
Named entity recognition & entity linking
Working with oral testimonies (ASR + transcript analysis)
Corpus linguistics methods
Introduction to AI ethics
Introduction to the EU AI Act
Legal & ethical impact assessments
Fundamental-rights impact assessments
Gender-based impact assessments
Data protection (law)
w o r k

Selected work.

A selection of recent projects.

Research infrastructure node

EHRI Annotator

A web-based tool for multilingual named entity recognition and entity linking in Holocaust-related texts. Built to streamline how editors of EHRI Online Editions detect named entities — people, organisations, locations, camps, ghettos — in documents and link them to unique identifiers in EHRI's controlled vocabularies and to GeoNames.

Uses a dual linking architecture: dense semantic matching (LaBSE embeddings indexed in Qdrant) for the modest EHRI vocabularies, and string-matching with domain-specific relevance weighting for the much larger GeoNames gazetteer. Exports TEI P5 XML ready for the publication pipeline.

Built on open-source models (a fine-tuned XLM-RoBERTa for NER, LaBSE for multilingual matching) and self-hostable infrastructure (Qdrant for the vector store) — designed to run inside the institution's own environment once deployment moves to EHRI's servers.

NER · entity linking · LaBSE · Qdrant · TEI P5
EHRI webinar — using AI for new Online Editions →
Federated metadata

OAI-PMH endpoint for the EHRI Portal

Built a standards-compliant OAI-PMH endpoint that exposes the EHRI Portal's data — metadata aggregated from Holocaust-era archives around the world, together with the controlled vocabularies and country-level context that go with it — to a national research data infrastructure, so the federated material can be harvested and discovered alongside other holdings.

OAI-PMH · metadata harmonisation · federation
Training · research infrastructure node

NER & entity linking training for a national Holocaust-research node

Designed and delivered a hands-on training programme on named entity recognition and entity linking for analysts at a national node of a European Holocaust research infrastructure — whose members include a national documentation centre and a public victims' fund. The course focused on finding and linking names of people across historical records and victims datasets, translating state-of-the-art NLP techniques into a workflow the participating institutions can apply themselves.

NER · entity linking · workshop curriculum
Research · oral history

Speech recognition and analysis on oral history archives and interviews

Extensive work with oral history archives and interview material — applying automatic speech recognition across multiple languages, then using corpus linguistics and NLP to analyse the transcripts. The pipeline has been the subject of hands-on workshops for several different audiences.

ASR · corpus linguistics · oral history · workshops

Get in touch.

Drop a line. We'll get back to you.

info@toolbox21.com →