Capstone project developed during the last week of the AI Security
Bootcamp, where I did a small-scale replication of the paper “Watermark
Stealing Attacks on Large Language Models”, which demonstrates that
statistical text watermarking schemes can be extracted and circumvented
by low-budget adversaries.
Replication
2025
2024
Replication of “Towards Automated Circuit Discovery for Mechanistic
Interpretability” paper, by Arthur Conmy et al., part of the process of
upskilling in Mechanistic Interpretability by Juan Belieni and Ana
Carolina Erthal, funded by Condor Initiative.