<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ai Control on Juan Belieni</title><link>https://belieni.me/tags/ai-control/</link><description>Recent content in Ai Control on Juan Belieni</description><generator>Hugo</generator><language>en</language><copyright>&amp;copy; Juan Belieni</copyright><lastBuildDate>Mon, 23 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://belieni.me/tags/ai-control/index.xml" rel="self" type="application/rss+xml"/><item><title>Monitors are Fragile under Information Asymmetry</title><link>https://belieni.me/projects/monitors-are-fragile-under-information-asymmetry/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0000</pubDate><guid>https://belieni.me/projects/monitors-are-fragile-under-information-asymmetry/</guid><description>&lt;p>&lt;em>Ana Carolina Erthal, Ria Deane, Juan Belieni, Gustavo Ewbank
Rodrigues Danon&lt;/em>&lt;/p>
&lt;p>Project developed during Apart Research’s AI Control Hackathon. We
studied whether AI control monitors remain reliable when attackers
exploit unfamiliar vulnerabilities. Using the ControlArena Bash setting,
we augmented a Docker environment with synthetic vulnerabilities and
evaluated monitor performance under fully informed, partially informed,
and uninformed knowledge conditions. We found that partial knowledge
provides little improvement over complete ignorance, while full
information substantially reduces attack success. This suggests that
monitors may rely on explicit prior knowledge of attack surfaces rather
than robust general reasoning about suspicious behavior.&lt;/p></description></item></channel></rss>