Mechanistic Interpretability

2024

Mechanistic Interpretability Course ↗ ↖

2024-08-23

Short course of four meetings given in Portuguese at FGV EMAp, with the aim of introducing the area of Mechanistic Interpretability for Large Language Models (LLMs).

Replication of "Towards Automated Circuit Discovery for Mechanistic Interpretability" ↗ ↖

2024-06-21

AI Safety Mechanistic Interpretability

Replication of “Towards Automated Circuit Discovery for Mechanistic Interpretability” paper, by Arthur Conmy et al., part of the process of upskilling in Mechanistic Interpretability by Juan Belieni and Ana Carolina Erthal, funded by Condor Initiative.