Homework 11
In this homework, we will use gradient free reinforcement learning to improve the agent we trained in homework 10. You will use your homework 10 model and policy specifics and fine-tune the imitation agent by gradient free method.
This homework is very open-ended. You can do anything you want short of hand-coding a policy. The only requirement is that the policy is learned.
Possible methods:
- Random search
- Hill climbing
- Augmented Random Search (SPSA)
- Cross Entropy Method
- Any other evolutionary algorithm
Input example
Observation Image
Output example
Logits of prediction actions:
-5.1 -1 0.6 0.2 -0.1 0.1
Getting Started
We provide you with starter code that loads the dataset from a training and validation set. We also provide an optional tensorboard interface.
- Define your model in
models.py
and modify the training code intrain.py
. - Train your model.
python3 -m homework.train
- Test your model by measuring the performance
python3 -m homework.test
- To evaluate your code against grader, execute:
python3 -m grader homework
Note that the grader can take a long time because it contains two parts and will train your agents for the first grading part. Make sure your training code is working before running the grader.
- Create the submission file
python3 -m homework.bundle
Parallel data collection
We provide you with a parallel data collection interface in policy_eval.py
. To use the interface to collect data in parallel,
evaluators = [PolicyEvaluator.remote(level, iterations) for _ in range(n_workers)]
rewards = ray.get([
evaluator.eval.remote(m, H) for m, evaluator in zip(models, evaluators)
])
Installing Ray
To use the parallel data collection, you need to install the ray library
pip3 install ray
Hint: Parallelize N/2
evaluators at the same time, where N
is the number of CPU cores on your machine.
Setting up Supertux
- This homework requires you to setup Pytux for performing the online evalution by playing the actual game. Instructions to set up Supertux can be found here.
- Once you have either downloaded the binary or compiled the Supertux source, create the symlinks for
pytux
anddata
folders using the following commandscd path/to/homework_11 ln -s path/to/pytux pytux ln -s path/to/data data
- Make sure the folder structure looks like this:
- homework_11
- grader
- homework
- pytux
- data
Pro-tip: Fine-tune from Imitation Learning agent
To speed up the training for this assignment, you can fine-tune your architecture from homework 10.
Grading
The grading will depend on your gradient free optimization implementation and your final policy performance. The grading schema is as follows:
- Linear grading of average performance across 5 levels from 0.2-0.35 training from scratch: 100 points
We will manually check the implementation of each submission, and outputting constant actions or harcoding part of the predictions will result in zero points.
Relevant operations
- operations of prior assignments