By: Rohan Phanse and Areeb Gani
Link to repository: https://github.com/rohanphanse/CPSC474-Final
To run the short demo for a few minutes (a sample of matchups), run the following command:
# Install dependencies
pip install -r requirements.txt
# Run demo
make
./BlokusDemo
Full reproducibility instructions are at the bottom of the README.
Blokus Duo is a combinatorial game i.e. finite, two-player, deterministic, perfect information, turn-based
Note that Blokus has very large state space (exceeding $10^{100}$).
Hw much does integrating DQN-learned Q-values into MCTS improve agent performance compared to using MCTS or DQN alone? Does this improvement depend on the quality of the training regimen of DQN and the rollout policy for MCTS (e.g., greedy vs. random)?
We first ran simulations with a random agent, a greedy agent, and an MCTS agent. We parallelize MCTS by building and merging multiple trees, use a greedy agent for stronger simulations with faster convergence, and accelerate action generation by precomputing piece orientations and focusing on anchor points.
We then trained two different DQN agents, dubbed DQN1 (trained against a greedy agent) and DQN2 (trained against a random agent with reward shaping). Both were trained using an adaptation of blokus.py
to a gym environment, shown in blokus_env.py
(this treats the opponent, either greedy or random, as a fixed part of the environment).
Finally, we integrated a MCTS+DQN hybrid agent that used the $Q$-values learned from DQN for the MCTS selection criteria. This final version uses the MCTS with greedy rollout, paired with DQN1.
Matchup | Win Rate (Player 1) | Win Rate (Player 2) |
---|---|---|
Greedy vs. Random | 94% (Greedy) | 6% (Random) |
DQN1 vs. Greedy | 15% (DQN1) | 85% (Greedy) |
DQN1 vs. Random | 65% (DQN1) | 35% (Random) |
DQN2 vs. Greedy | 6% (DQN2) | 94% (Greedy) |
DQN2 vs. Random | 41% (DQN2) | 59% (Random) |
MCTS (with random rollout) vs. Greedy | 59% (MCTS) | 41% (Greedy) |
MCTS (with greedy rollout) vs. Greedy | 80% (MCTS) | 20% (Greedy) |
MCTS+DQN vs. MCTS | 62% (MCTS+DQN) | 38% (MCTS) |
Detailed evaluations can be found in the /evals
folder for each matchup. These include the number of games run (which varies depending on the agent), point margin, etc.
File/Directory | Description |
---|---|
README.md | Project overview, instructions, and results. |
blokus.py | Core Blokus Duo game logic and state representation. |
blokus_env.py | Gym-style environment wrapper for Blokus, used for DQN training. |
greedy.py | Implementation of the greedy agent. |
mcts.py | Monte Carlo Tree Search agent and evaluation scripts. |
dqn_agent.py | Deep Q-Network agent implementation. |
train_dqn.py | Script to train DQN1 (against greedy agent). |
train_dqn_random.py | Script to train DQN2 (against random agent with reward shaping). |
demo.py | Script to run a short demo of agent matchups. |
parse_results.py | Script to parse and analyze evaluation results, generate plots/statistics. |
agent_runs/ | Folder containing raw results of agent matchups. |
evals/ | Folder containing parsed results, plots, and metrics for each matchup. |
dqn_models/ | Saved DQN model weights. |
dqn_reward_logs/ | Reward logs from DQN training runs. |
dqn_training_plots/ | Training plots from DQN training runs. |
test.sh | Shell script to run all agent matchups for full evaluation. |
Our complete results take hours to obtain, due to the branching factor of MCTS and training scheme of DQN. To reproduce our full results observed in mcts.py
, run
# Run full agent matchups
./test.sh
The DQN1 model was trained using python3 train_dqn.py
, and the DQN2 model was trained using python3 train_dqn_random.py
. Hyperparameters for the DQN architecture and training process are included in these files. All evals in the /evals
folder were generated using python3 parse_results.py --source [agent_runs/matchup]
. Information about the evaluation setup (i.e. number of games run) and more detailed statistics + plots of the results are included in /evals
.