CPSC 474 Final: Monte Carlo Tree Search Techniques and Deep Q-Learning for Blokus Duo

May 2025
Python
PyTorch

By: Rohan Phanse and Areeb Gani

Link to repository: https://github.com/rohanphanse/CPSC474-Final

Demo

To run the short demo for a few minutes (a sample of matchups), run the following command:

# Install dependencies
pip install -r requirements.txt
# Run demo
make
./BlokusDemo

Full reproducibility instructions are at the bottom of the README.

Blokus Duo

Blokus Duo is a combinatorial game i.e. finite, two-player, deterministic, perfect information, turn-based

  • Each player begins with 21 pieces.
  • Each turn, place a piece on empty spots of the 14 x 14 grid. The piece must touch one of your previous pieces (by a corner, not an edge!)
  • If you cannot play any more pieces, you must pass.
  • Once both have passed, the winner is the player whose remaining pieces have smaller total value.

Note that Blokus has very large state space (exceeding $10^{100}$).

Research Question

Hw much does integrating DQN-learned Q-values into MCTS improve agent performance compared to using MCTS or DQN alone? Does this improvement depend on the quality of the training regimen of DQN and the rollout policy for MCTS (e.g., greedy vs. random)?

Agents

We first ran simulations with a random agent, a greedy agent, and an MCTS agent. We parallelize MCTS by building and merging multiple trees, use a greedy agent for stronger simulations with faster convergence, and accelerate action generation by precomputing piece orientations and focusing on anchor points.

We then trained two different DQN agents, dubbed DQN1 (trained against a greedy agent) and DQN2 (trained against a random agent with reward shaping). Both were trained using an adaptation of blokus.py to a gym environment, shown in blokus_env.py (this treats the opponent, either greedy or random, as a fixed part of the environment).

Finally, we integrated a MCTS+DQN hybrid agent that used the $Q$-values learned from DQN for the MCTS selection criteria. This final version uses the MCTS with greedy rollout, paired with DQN1.

Evaluation Results

MatchupWin Rate (Player 1)Win Rate (Player 2)
Greedy vs. Random94% (Greedy)6% (Random)
DQN1 vs. Greedy15% (DQN1)85% (Greedy)
DQN1 vs. Random65% (DQN1)35% (Random)
DQN2 vs. Greedy6% (DQN2)94% (Greedy)
DQN2 vs. Random41% (DQN2)59% (Random)
MCTS (with random rollout) vs. Greedy59% (MCTS)41% (Greedy)
MCTS (with greedy rollout) vs. Greedy80% (MCTS)20% (Greedy)
MCTS+DQN vs. MCTS62% (MCTS+DQN)38% (MCTS)

Detailed evaluations can be found in the /evals folder for each matchup. These include the number of games run (which varies depending on the agent), point margin, etc.

Observations

  • The greedy agent is a very strong baseline, easily defeating random play and both DQN agents.
  • MCTS agents, especially when simulations use the greedy policy, are much stronger than random and competitive with the greedy agent.
  • Combining MCTS with DQN (MCTS+DQN) yields an improvement over vanilla MCTS, even if the DQN agents on their own are not as powerful.
  • Due to the large state space and squishing of $Q$-values, the DQN agents took a long time to train and exhibited great stochasticity. In the future, smarter reward shaping, modifications to architecture/hyperparameters, and different training scheme (e.g. with the masking loss function) could aid performance when trained over longer periods of time.

Directory Structure

File/DirectoryDescription
README.mdProject overview, instructions, and results.
blokus.pyCore Blokus Duo game logic and state representation.
blokus_env.pyGym-style environment wrapper for Blokus, used for DQN training.
greedy.pyImplementation of the greedy agent.
mcts.pyMonte Carlo Tree Search agent and evaluation scripts.
dqn_agent.pyDeep Q-Network agent implementation.
train_dqn.pyScript to train DQN1 (against greedy agent).
train_dqn_random.pyScript to train DQN2 (against random agent with reward shaping).
demo.pyScript to run a short demo of agent matchups.
parse_results.pyScript to parse and analyze evaluation results, generate plots/statistics.
agent_runs/Folder containing raw results of agent matchups.
evals/Folder containing parsed results, plots, and metrics for each matchup.
dqn_models/Saved DQN model weights.
dqn_reward_logs/Reward logs from DQN training runs.
dqn_training_plots/Training plots from DQN training runs.
test.shShell script to run all agent matchups for full evaluation.

Reproducibility

Our complete results take hours to obtain, due to the branching factor of MCTS and training scheme of DQN. To reproduce our full results observed in mcts.py, run

# Run full agent matchups
./test.sh

The DQN1 model was trained using python3 train_dqn.py, and the DQN2 model was trained using python3 train_dqn_random.py. Hyperparameters for the DQN architecture and training process are included in these files. All evals in the /evals folder were generated using python3 parse_results.py --source [agent_runs/matchup]. Information about the evaluation setup (i.e. number of games run) and more detailed statistics + plots of the results are included in /evals.