Math-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
PROPOR 2026·
The use of large language models (LLMs) for complex mathematical
reasoning is an emergent area of research, with fast progress in
methods, models, and benchmark datasets. However, most mathematical
reasoning evaluations exhibit a significant linguistic bias, with the
vast majority of benchmark datasets being exclusively in English or (at
best) translated from English. We address this limitation by introducing
Math-PT, a novel dataset comprising 1,729 mathematical problems written
in European and Brazilian Portuguese. Math-PT is curated from a variety
of high-quality native sources, including mathematical Olympiads,
competitions, and exams from Portugal and Brazil. We present a
comprehensive benchmark of current state-of-the-art LLMs on Math-PT,
revealing that frontier reasoning models achieve strong performance in
multiple choice questions compared to open weight models, but that their
performance decreases for questions with figures or open-ended
questions. To facilitate future research, our benchmark dataset will be
publicly released, along with the model outputs.