Replication

2025

Replication of "Watermark Stealing" ↗ ↖

2025-08-24

Capstone project developed during the last week of the AI Security Bootcamp, where I did a small-scale replication of the paper “Watermark Stealing Attacks on Large Language Models”, which demonstrates that statistical text watermarking schemes can be extracted and circumvented by low-budget adversaries.

2024

Replication of "Towards Automated Circuit Discovery for Mechanistic Interpretability" ↗ ↖

2024-06-21

Replication AI Safety Mechanistic Interpretability

Replication of “Towards Automated Circuit Discovery for Mechanistic Interpretability” paper, by Arthur Conmy et al., part of the process of upskilling in Mechanistic Interpretability by Juan Belieni and Ana Carolina Erthal, funded by Condor Initiative.