A Generalist Hanabi Agent - Recurrent Replay Relevance Distributed DQN (R3D2)

Overview

Recurrent Replay Relevance Distributed DQN (R3D2) is a generalist multi-agent reinforcement learning (MARL) agent designed to play Hanabi across all game settings while adapting to unfamiliar collaborators. Unlike traditional MARL agents that struggle with transferability and cooperation beyond their training setting, R3D2 utilizes language-based reformulation and a distributed learning approach to handle dynamic observation and action spaces. This allows it to generalize across different game configurations and effectively collaborate with diverse algorithmic agents.

Key Features

Generalized MARL agent: Play Hanabi across different player settings (2-to-5 players) without changing architecture or retraining from scratch.
Adaptive cooperation: Capable of collaborating with unfamiliar partners, overcoming limitations of traditional MARL systems.
Language-based task reformulation: Utilizes text representations to enhance transfer learning and generalization.
Distributed Learning Framework: Employs a scalable MARL algorithm to handle dynamic observations and actions effectively.

R3D2 Architecture:

How to use it:

Follow the steps here: