Ai Control

Monitors are Fragile under Information Asymmetry ↗ ↖

Apart Research AI Control Hackathon·2026-03-23

Ana Carolina Erthal, Ria Deane, Juan Belieni, Gustavo Ewbank Rodrigues Danon

Project developed during Apart Research’s AI Control Hackathon. We studied whether AI control monitors remain reliable when attackers exploit unfamiliar vulnerabilities. Using the ControlArena Bash setting, we augmented a Docker environment with synthetic vulnerabilities and evaluated monitor performance under fully informed, partially informed, and uninformed knowledge conditions. We found that partial knowledge provides little improvement over complete ignorance, while full information substantially reduces attack success. This suggests that monitors may rely on explicit prior knowledge of attack surfaces rather than robust general reasoning about suspicious behavior.