Monitors are Fragile under Information Asymmetry

Mon, 23 Mar 2026 00:00:00 +0000

Ana Carolina Erthal, Ria Deane, Juan Belieni, Gustavo Ewbank Rodrigues Danon

Project developed during Apart Research’s AI Control Hackathon. We studied whether AI control monitors remain reliable when attackers exploit unfamiliar vulnerabilities. Using the ControlArena Bash setting, we augmented a Docker environment with synthetic vulnerabilities and evaluated monitor performance under fully informed, partially informed, and uninformed knowledge conditions. We found that partial knowledge provides little improvement over complete ignorance, while full information substantially reduces attack success. This suggests that monitors may rely on explicit prior knowledge of attack surfaces rather than robust general reasoning about suspicious behavior.

Ai Control on Juan Belieni

Monitors are Fragile under Information Asymmetry