DeepMind Blog · 1 Jul

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

safetyresearch

Google DeepMind has announced Gemma Scope 2, a comprehensive open suite of interpretability tools designed to help researchers understand the inner workings of language models.

The new release extends interpretability capabilities across the entire Gemma 3 model family, ranging from 270M to 27B parameters.

Large Language Models can perform impressive reasoning tasks, yet their internal decision-making processes remain largely opaque to researchers and developers.

When AI systems behave unexpectedly, the lack of visibility into their internal workings makes it difficult to pinpoint the exact reasons for their behavior.

The original Gemma Scope was released last year for Gemma 2, establishing DeepMind's commitment to advancing interpretability science.

These new tools enable researchers to trace potential risks across the entire "brain" of the model, supporting safer AI development practices.

DeepMind describes this release as the largest ever open-source release of interpretability tools, representing a significant contribution to the AI safety community.

Read original → deepmind.google