Teun van der Weij

AI safety/security/alignment researcher

Co-founder & Board Member at ENAIS

Table of contents

teun van der weij

About me

I am an independent AI safety researcher, figuring out what to do next to make AI be safe and useful. (Feel free to substitue safety with either aligment or security if you prefer.) Currently I mostly work on AI sandbagging and control.

Until January 2025, I was a Resident at Mantic. Mantic's goal is to create an AI superforecaster, and my role was best described as a research scientist.

Before Mantic, I focused on strategic underperformance on evaluations (sandbagging) by AI systems. This started at MATS, and I continued working on this with a grant from the AI Safety Fund from the Frontier Model Forum.

I'm also a co-founder and board member at ENAIS (European Network for AI Safety), working to improve AI safety activities and coordination in Europe.

Work experience

AI safety researcher

Jan 2025 – Present | Remote

I am working on research related to sandbagging and AI control. I examine which (if any) control mechanisms can catch sandbagging and also general sabotage attempts.

Resident at Mantic

Oct 2024 – Present | London, UK & Remote

Mantic has the goal of creating an AI superforecaster. I am working as a research scientist at the start up. Mantic was founded by Toby Shevlane and Ben Day. mntc.ai ↗

Independent Research on AI Sandbagging

Aug 2024 – Oct 2024 | London, UK & Remote

Continued research on strategic underperformance on evaluations (sandbagging) with a grant from the AI Safety Fund from the Frontier Model Forum. Together with Francis Rhys Ward and Felix Hofstätter, I continued the research started at MATS.

Research Scholar at MATS

Jan 2024 – Jul 2024 | Berkeley, CA, US; London, UK & Remote

MATS is a program to train AI safety researchers. At MATS, I mostly worked on strategic underperformance on evaluations (sandbagging) of general purpose AI with the mentorship of Francis Rhys Ward. matsprogram.org ↗

Co-founder and Co-director at ENAIS

Dec 2022 – Present | Remote

I co-founded the European Network for AI Safety (ENAIS), with a goal to improve coordination and collaboration between AI Safety researchers and policymakers in Europe. enais.co ↗

SPAR participant

Feb 2023 – May 2023 | Remote

Participated in the Supervised Program for Alignment Research organized at UC Berkeley, focusing on evaluating the shutdown problem in language models.

Research papers

For another overview, see my Google Scholar ↗. Although quite some citations are missing, so you can also look at Semantic Scholar ↗.

Interesting empirical work aiming to inform how evaluators can best elicit AI systems with potentially hidden capabilities.

I supervised this paper. Adding noise is a very interesting idea, and further experiments are being conducted to see if this can be used to robustly and accurately detect sandbagging.

I am most proud of this paper, and I think it's my most impactful work so far. It's great to see our work being used in both technical and governance contexts, inspiring groups at prominent AI safety organizations.

This paper was very helpful in improving my technical skills, both in conducting experiments and in understanding transformers. The paper contains some interesting ideas, but it's impact is limited.

My first project in AI safety. In some small experiments, we showed that GPT-4 has the capability to reason correctly about avoiding shutdown in certain scenarios, and actually does this in some cases. - van der Weij, T., Soancatl Aguilar, V., & Solorio-Fernández, S. (2022). Runtime Prediction of Filter Unsupervised Feature Selection Methods. Research in Computing Science, 150(8), 138-150.

Education

MSc Student at Utrecht University

Sep 2022 – Nov 2024 | Utrecht, Netherlands

Coursework includes Advanced Machine Learning, Natural Language Processing, Human-centered Machine Learning, Pattern Recognition & Deep Learning, and Philosophy of AI.

Grade: 8.2/10

BSc Student at University College Groningen

2018 – 2021 | Groningen, Netherlands

Focus on AI, philosophy, and psychology.

Grade: summa cum laude (with highest distinction).

High School Student at Het Nieuwe Eemland

2012 – 2018 | Amersfoort, Netherlands

Grade: cum laude (with distinction).

Activity highlights

Moderator of a Q&A

I moderated a Q&A event with (ex-)OpenAI and Alignment Research Center researchers (Jeff Wu, Jacob Hilton, and Daniel Kokotajlo). We had over 1,800 people attending the event on existential risks posed by AI.

Presentations

I have presented at various events on AI safety and related topics. Topics include AI sandbagging, how to contribute to AI safety without doing technical research, and more.

If you want me to present at your event, feel free to reach out. I might charge a fee for the presentation based on the event, but I am happy to discuss this. ure Selection Methods.* Research in Computing Science, 150(8), 138-150.

Field-building

My work at ENAIS is the best example of helping to support the field, but I have also helped organize events like the Dutch AI Safety Retreat.

Outside of work

I enjoy listening to music, so I go to concerts and festivals quite regularly. I listen to many genres, but my current two favorites are reggae and trance.

I like travelling too, so I try to visit new places when I can. Some favorites are the Nordics, Australia, and Zimbabwe.

Nature is nice too, and I mostly enjoying running, hiking, and snowboarding. Most recently I have gotten into splitboarding.

Contact

Email: mailvan{first name}@gmail.com