Skip to content

Deep reinforcement learning implementation that trains AIs for the CodeCraft real-time strategy game.

Notifications You must be signed in to change notification settings

cswinter/DeepCodeCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

491 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep CodeCraft

Hacky research code that trains policies for the CodeCraft real-time strategy game with proximal policy optimization.

Blog post: Mastering Real-Time Strategy Games with Deep Reinforcement Learning: Mere Mortal Edition

Requirements

Setup

Install dependencies with

pip install -r requirements.txt
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu92, cu101 or cu102 depending on your PyTorch installation.

If you want the training code to record metrics to Weights & Biases, run wandb login.

Usage

The first step is to setup and run CodeCraft Server.

Training

To train a policy with the default set of hyperparameters, run:

EVAL_MODELS_PATH=/path/to/golden-models python main.py --hpset=standard --out-dir=${OUT_DIR}`

Logs and model checkpoints will be written to the ${OUT_DIR} directory. If you want policies to be evaluted against a set of fixed opponents during training, download the required checkpoints available here to the right subfolder in the folder specified by EVAL_MODEL_PATH. For evaluations with the standard config, you need standard/curious-galaxy-40M.pt and standard/graceful-frog-100M.pt. To disable evaluation of the policy during training, set --eval_envs=0. To see additional options, run python main.py --help and consult hyperparams.py.

Showmatch

To run games with already trained policies, run:

python showmatch.py /path/to/policy1.pt /path/to/policy2.pt --task=STANDARD --num_envs=64

You can then watch the games at http://localhost:9000/observe?autorestart=true&autozoom=true.

Job Runner

The job runner allows you to schedule and execute many runs in parallel. The command

python runner.py --jobfile-dir=${JOB_DIR} --out-dir=${OUT_DIR} --concurrency=${CONCURRENCY}

starts a job runner that watches the ${JOB_DIR} directory for new jobs, writes results to folders created in ${OUT_DIR} and will run up to ${CONCURRENCY} experiments in parallel.

You can then schedule jobs with

python schedule.py --repo-path=https://github.com/cswinter/DeepCodeCraft.git --queue-dir=${JOB_DIR} --params-file=params.yaml

where params.yaml is a file that specifies the set of hyperparameters to use, for example:

- hpset: standard
  adr_variety: [0.5, 0.3]
  lr: [0.001, 0.0003]
- hpset: standard
  repeat: 4
  steps: 300e6

The repeat parameter tells the job runner to spawn multiple runs. When a hyperparameter is set to a list of different values, one experiment is spawned for each combination. So above params.yaml will spawn a total of 8 experiment runs, 4 of which will run for 300 million samples with the default set of hyperparameters, and one additional run for all 4 combinations of the adr_variety and lr hyperparameters.

The ${JOB_DIR} may be on a remote machine that you can access via ssh/rsync, e.g. --queue-dir=192.168.0.101:/home/clemens/xprun/queue.

Citation

@misc{DeepCodeCraft2020,
  author = {Winter, Clemens},
  title = {Deep CodeCraft},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cswinter/DeepCodeCraft}}
}

About

Deep reinforcement learning implementation that trains AIs for the CodeCraft real-time strategy game.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published