Explaining Reinforcement Learning

This page provides a summary sheet that includes the general goal, reference papers (both mine and external) for an overview of the topic, as well as the domains explored so far. We are also interested in extending the applications of these techniques beyond their traditional domains. If you have expertise in other areas (e.g., neuroscience, gaming, or audio/speech modeling), we would be happy to explore potential extensions into those fields.

Goal: The goal of this research area is to explore the application of explanation methods in deep reinforcement learning (e.g., games). This represents one of the least studied and most challenging settings in eXplainable Artificial Intelligence. Research topics include policy extraction, policy summarization, self-interpretable deep reinforcement learning, and self-interpretable deep learning guided by reinforcement learning.

Domains: Games (Atari).

Reference Papers:

  1. Survey: [LINK]
  2. Reward Decomposition: [LINK]
  3. Concept Learning : [LINK]