Explaining Neurons

This page provides a summary sheet that includes the general goal, reference papers (both mine and external) for an overview of the topic, as well as the domains explored so far. We are also interested in extending the applications of these techniques beyond their traditional domains. If you have expertise in other areas (e.g., neuroscience, gaming, or audio/speech modeling), we would be happy to explore potential extensions into those fields.

Goal: The goal of this research area is to understand what deep neural networks learn during the training process. Recently, this field has been categorized under the umbrella term Mechanistic Interpretability. My research projects usually focus on analyzing the behavior of individual neurons and groups of neurons, identifying the concepts they learn to recognize, and understanding the relationships between these concepts. They typically combine tools from classical AI (e.g., heuristic search and clustering), statistical analysis, and recent advancements in AI to explore this direction.

Domains: NLP, Vision.

Reference Papers:

  1. Logical (Compositional) Explanations: [(La Rosa et al., 2023)] [(Makinwa et al., 2022)]
  2. Linear Explanations: [Link]
  3. Circuits (chain of neurons): [Link]

References

2023

  1. Conference
    Towards a fuller understanding of neurons with Clustered Compositional Explanations
    Biagio La Rosa, Leilani H. Gilpin, and Roberto Capobianco
    In Thirty-seventh Conference on Neural Information Processing Systems, 2023

2022

  1. Conference
    Detection Accuracy for Evaluating Compositional Explanations of Units
    Sayo M. Makinwa, Biagio La Rosa, and Roberto Capobianco
    In AIxIA 2021 - Advances in Artificial Intelligence, 2022