Homework 10
In this homework we will use imitation learning to train an agent to play SuperTux. You’ll use a new tux dataset that contains human players’ trajectories and design a model to predict what actions to take given the current observation.
Note: The data is about 5GB once decompressed. It will likely not fit on lab machines. We’re working on a solution.
Action prediction
You will design a network that is similar to the earlier assignment to predict what action (keyboard input) to execute in a SuperTux game. The input to your network is a sequence of observations, where observations are 64x64 RGB images. You have to predict what action to take given the observations where actions are 6d binary vector of key states (0: up, 1:down) as in previous homework. It is recommended that you use ideas in the last homework to utilize temporal dependencies between observations.
In this homework we will measure your model’s performance both against the expert datasets in offline setting, as well as how it performs in the actual game.
Once you have specified the network, train it
Input example
Observation Image
Output example
Logits of prediction actions:
-5.1 -1 0.6 0.2 -0.1 0.1
which is equivalent to the key states: 0 0 1 1 0 1
Setting up Supertux and Dataset
- The
action_img_trainval.tar.gz
file contains two folderstrain
andval
. Extract them in the same directory which contains thehomework
and thegrader
folders. - This homework requires you to setup Pytux for performing the online evalution by playing the actual game. Instructions to set up Supertux can be found here.
- Once you have either downloaded the binary or compiled the Supertux source, create the symlinks for
pytux
anddata
folders using the following commandscd path/to/homework_10 ln -s path/to/pytux pytux ln -s path/to/data data
- Make sure the folder structure looks like this:
- homework_10
- grader
- homework
- train
- val
- pytux
- dataset
Getting Started
We provide you with starter code that loads the dataset from a training and validation set. We also provide an optional tensorboard interface.
- Define your model in
models.py
and modify the training code intrain.py
. - Train your model.
python3 -m homework.train
- Optionally, you can use tensorboard to visualize your training loss and accuracy.
python3 -m homework.train -l myRun
and in another terminal
tensorboard --logdir myRun
, wheremyRun
is the log directory. Pro-tip: You can run tensorboard on the parent directory of many logs to visualize them all. - Test your model by measuring the log-likelihood
python3 -m homework.test
- Test your policy performance in a real Tux game
python3 -m homework.play
- To evaluate your code against grader, execute:
python3 -m grader homework
Note that the grader can take a long time because it contains two parts - offline and online evaluation. Make sure your model performs before running the grader. You can use
test.py
to measure the offline performance and useplay.py
to measure online performance. - Create the submission file
python3 -m homework.bundle
Grading
The grading will be depend on the log-likelihood scores of your model as well as how well the trained policy actually plays the Supertux game. The grading schema is as follows:
- Linear grading of Log-likelihood scores between 0.5 and 0.1: 50 points.
- Grading based on position reached by tux on 4 levels of Supertux
- For level 01 - Welcome to Antarctica.stl, position range 0.1-0.24 will be graded linearly for 10 points.
- For level 02 - The Journey Begins.stl, position range 0.03-0.18 will be graded linearly for 10 points.
- For level 03 - Via Nostalgica.stl, position range 0.01-0.16 will be graded linearly for 10 points.
- For level 04 - Tobgle Road.stl, position range 0.04-0.14 will be graded linearly for 10 points.
- For level 05 - The Somewhat Smaller Bath.stl, position range 0.05-0.1 will be graded linearly for 10 points.
Note
You may find the default loss functions makes the network too pessismitic about pressing key strokes. To solve the class imbalance problem, you can try reweighting the positive and negatives classes by their frequencies, a technique we used in Homework 7
Important Tips
You can still do the training remotely but the grader and the test modules won’t run over ssh as pytux does not have the support for playing supertux over ssh. Thus, you need to use either your own machines or the lab machines for running these modules. The provided binary and the source works best on Ubuntu systems. You can try compiling the source for Mac OS but it definitely won’t work for Windows. The binary might not work due to different versions of dependencies installed on your system, hence compiling from source following the instructions here.
For compiling Supertux on Ubuntu, use the following command to install all the dependencies required for building it.
sudo apt-get install build-essential cmake libcurl4-openssl-dev libglew-dev libsdl2-image-dev libsdl2-dev libboost-all-dev
Contact the TAs if you face any issues setting up Supertux on your system and advisably, set this up early to avoid any late-minute problems.
Relevant operations
- Conv Layers
- Recurrent Layers
- operations of prior assignments