Capstone project developed during the last week of the AI Security
Bootcamp, where I did a small-scale replication of the paper “Watermark
Stealing Attacks on Large Language Models”, which demonstrates that
statistical text watermarking schemes can be extracted and circumvented
by low-budget adversaries.
AI Safety
2025
AI Security Bootcamp
AISB is a 4-week long intensive program to bring researchers and engineers up to speed on security fundamentals for AI systems.
ML4Good Colombia 2025
·18 words
ML4Good Colombia 2025 was a bootcamp focused on AI Safety upskilling, where I participated as a teaching assistant.
Exploratory analysis of multilingual SAE features
·2245 words
Recent research from Anthropic suggests that Sparse Autoencoder (SAE)
features can be multilingual, activating for the same concept across
multiple languages. However, if multilingual features are scarce and not
as good as monolingual ones, SAEs could have their robustness
undermined, leaving them vulnerable to failures and adversarial attacks
in languages not well-represented by the model. In this post, I present
findings from an exploratory analysis conducted to assess the degree of
multilingualism in SAE features.
2024
Short course of four meetings given in Portuguese at FGV EMAp, with
the aim of introducing the area of Mechanistic Interpretability for
Large Language Models (LLMs).
ML4Good Brazil 2024
·41 words
ML4Good is a bootcamp focused on AI Safety upskilling, including workshops on interpretability, alignment and governance of artificial intelligence.
Replication of “Towards Automated Circuit Discovery for Mechanistic
Interpretability” paper, by Arthur Conmy et al., part of the process of
upskilling in Mechanistic Interpretability by Juan Belieni and Ana
Carolina Erthal, funded by Condor Initiative.
Condor Camp
·57 words
Condor Camp was an amazing event on AI safety that happened in Mexico City. There, I learned and discussed topics related to AI governance and technical AI safety. I was also introduced to the effective altruism philosophy. It was probably the best experience regarding career planning as well.