ActiveTigger@PyData2025
Collaborative Text Annotation Tool for Computational Social Sciences
Julien Boelaert - Paul Girard - Étienne Ollion - Émilien Schultz
CREST/GENES
2025-09-30
Text as data in social sciences
- Explosion of text content (social media, news, …)
- Adoption of NLP methods from CSS & DH
Example from IC2S2 2025
BERT, GPT and proliferation of methods
Rapid adoption of model-based treatments
- Encoders with BERT (2018)
- Decoders with GPT (2022)
- Proliferation of closed & open models
Consequences :
- diversity of new unstabilized methods
- needs of specific ressources (coding skills, GPU)
For what kind of uses ?
- What are the stances on public opinion on global warming ? 1
- What is the prevalence of gender-based analyzes in the French Social Sciences? 2
- What are the circumstances of transmission of an emerging infectious disease from survey open answers ? 3
Annotate > 50000 abstracts on concepts
The origins of ActiveTigger 🐯
Training classifiers for social sciences
- Specific and frequent task : text classification
- Encoders transformers models (BERT) are powerful
- Active Learning allows to accelerate annotation
A first prototype in R Shiny + Python in 2023
Since 2024 : ➕ Collaboration ➕ Stability ➕ Features
Main goals : an open source research software
- accelerate classifier training to scale annotations on a large corpus
- possibility to evaluate classifier performance
- limit dependencies on external services with small models
- pedagogical solution for non-expert users/training
- stimulate community discussion on needs & best practices
- promote open source & open research tools
Architecture
- API (Python) / Frontend (React) / API Client (Python)
- Leverage Python ML/DL packages (sklearn, transformers, …)
- UX designed for non-expert users
- i.e. annotation on smartphone
- Both on premise and software as a service to adjust the needs
Demo time
- A lot of scientific publication mentionning Python
- Only some of them are PyData-related
- How to annotate them PyData/Not Pydata
Current situation & next step
- A community of early users
- Stable version by the end of 2025
- Streamline UX
- Achieve dockerization
- Finish documentation
- In the future
- Animate a community
- Prioritize our roadmap (Bertopic)
Fundings
https://www.css.cnrs.fr/active-tigger/
Fundings : GENES / DRARI / Progedo
![]()
Main contributors : Julien Boelaert (UL) ; Étienne Ollion (CREST) ; Paul Girard (OuestWare) ; Emma Bonutti (CREST) ; Annina Claesson (CREST) ; Léo Mignot (CED) ; Jule Brion (PACTE) ; Arnault Chatelain (CREST) ; Axel Morin (CREST)