Short course of four meetings given in Portuguese at FGV EMAp, with the aim of introducing the area of Mechanistic Interpretability for Large Language Models (LLMs).
Mechanistic Interpretability
2024
Replication of “Towards Automated Circuit Discovery for Mechanistic Interpretability” paper, by Arthur Conmy et al., part of the process of upskilling in Alignment for AI Safety by Juan Belieni and Ana Carolina Erthal funded by Condor Initiative.