Teun van der Weij
π€ About me
I am a Research Scientist at Apollo Research in Zurich. I care about effectively making the world a better place, and therefore I work on making AI systems safer.
I am also a board member at the European Network for AI Safety which I co-founded. We are supporting AI safety activity throughout Europe.

πΌ Work experience
Research scientist at Apollo Research
π June 2025 β Present | π Zurich / LondonI predominantly work on evaluating AI capabilities and propensities regarding scheming, and also on AI control. I mostly work from Zurich. apolloresearch.ai β
AI safety researcher
π Jan 2025 β May 2025 | π RemoteI worked on research related to AI sandbagging and control. I examined how well monitors can catch both sandbagging and more general sabotage attempts.
Resident at Mantic
π Oct 2024 β Present | π London, UK & RemoteMantic has the goal of creating an AI superforecaster. I worked as a research scientist / engineer at the startup. mntc.ai β
Independent research on AI sandbagging
π Aug 2024 β Oct 2024 | π London, UK & RemoteI continued research on strategic underperformance on evaluations (sandbagging) with a grant from the AI Safety Fund from the Frontier Model Forum. Together with Francis Rhys Ward and Felix HofstΓ€tter, I continued the research started at MATS.
Research scholar at MATS
π Jan 2024 β Jul 2024 | π Berkeley, CA, US; London, UK & RemoteMATS is a program to train AI safety researchers. At MATS, I mostly worked on strategic underperformance on evaluations (sandbagging) of general purpose AI with the mentorship of Francis Rhys Ward. matsprogram.org β
Co-founder and co-director at ENAIS
π Dec 2022 β Present | π RemoteI co-founded the European Network for AI Safety (ENAIS), with a goal to improve coordination and collaboration between AI Safety researchers and policymakers in Europe. enais.co β
SPAR participant
π Feb 2023 β May 2023 | π RemoteParticipated in the Supervised Program for Alignment Research organized at UC Berkeley, focusing on evaluating the shutdown problem in language models.
π Research papers
Here is my Google Scholar β. Although quite some citations are missing, so you can also look at (which misses different citations) Semantic Scholar β.
The Elicitation Game: Evaluating Capability Elicitation Techniques (2024)
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models (2024)
AI Sandbagging: Language Models can Strategically Underperform on Evaluations (2024) β
Extending Activation Steering to Broad Skills and Multiple Behaviours (2024)
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios (2023)
Runtime Prediction of Filter Unsupervised Feature Selection Methods (2022)
βοΈ Essays
I have written some essays, here's a list.
-
How to mitigate sandbagging I outline when sandbagging is especially problematic based differences regarding three factors: fine-tuning access, data quality, and scorability. I also describe various sandbagging mitigations, so it's a good place to get project ideas. Read on the Alignment Forum β
-
An introduction to AI sandbagging I describe in more detail what AI sandbagging is. I provide six examples, and I take my time to define terms. This essay is a good place to understand what AI sandbagging is! Read on the Alignment Forum β
-
Simple distribution approximation What happens if you independently sample a language model 100 times with the task of 80% of those outputs being A, and the remaining 20% of outputs being B? Can it do this? Read on the Alignment Forum β
-
Beyond humans: why all sentient beings matter in existential risk I do not only think about empirical machine learning, I like philosophy too! For this essay, I noticed that definitions about existential risk typically only include humans. I think this should be extended to include all sentient beings (of course humans are very important too). Read on the EA Forum β
π Education
MSc at Utrecht University
π Sep 2022 β Nov 2024 | π Utrecht, NetherlandsCoursework includes Advanced Machine Learning, Natural Language Processing, Human-centered Machine Learning, Pattern Recognition & Deep Learning, and Philosophy of AI.
π Grade: 8.2/10
BSc at University College Groningen
π 2018 β 2021 | π Groningen, NetherlandsFor this programme I had the freedom to choose courses in any discipline, and I used that freedom. However, most of my courses were in AI, philosophy, and psychology. I think very highly of this programme, and would definitely recommend it to people choosing bachelors.
π Grade: summa cum laude (with highest distinction).
High school diploma from Het Nieuwe Eemland
π 2012 β 2018 | π Amersfoort, Netherlandsπ Grade: cum laude (with distinction).
πΈ Activity highlights
Moderator of a Q&A
I moderated a Q&A event with (ex-)OpenAI and Alignment Research Center researchers (Jeff Wu, Jacob Hilton, and Daniel Kokotajlo). We had over 1,800 people attending the event on existential risks posed by AI.
Presentations
I have presented at various events on AI safety and related topics. Topics include AI sandbagging, how to contribute to AI safety without doing technical research, and more.
If you want me to present at your event, feel free to reach out. I might charge a fee for the presentation based on the event, but I am happy to discuss this.
Field-building
My work at ENAIS is the best example of helping to support the field, but I have also helped organize events like the Dutch AI Safety Retreat.
π Outside of work
I enjoy listening to music, so I go to concerts and festivals quite regularly. I listen to many genres, but my current two favorites are reggae and trance.
I like travelling too, so I try to visit new places when I can. Some favorites are the Nordics, Australia, and Zimbabwe.
Nature is nice too, and I mostly enjoy running, hiking, and snowboarding. Most recently I have gotten into splitboarding, which is taking a snowboard, splitting it in two to make them skis, putting skins underneath, and walking up a mountain. Then you can snowboard down again in beautiful places and hopefully great snow!
π§ Contact
π§ Email: mailvan{first name}@{google's email}