Lab: Active Reinforcement Learning
CSC261 - Artificial Intelligence - Weinman
- Summary:
- We get smarter by learning utilities without a world
model and get active by exploring and learning policies to go with
them.
Preparation
- Move to your directory for materials from the prior lab.
-
$ cd somewhere/mdp
- Copy starter materials for this lab (note the dot at the end of the
command!). The -i flag asks to overwrite any files that may
exist. If you made any changes to your Makefile you may want
to preserve, rather than overwrite it; otherwise it is fine to overwrite
the prior version.
-
$ cp -i ~weinman/courses/CSC261/code/tdq/* .
Exercises
A: Temporal-Difference Learning
- Open td.c and locate the main function. Read through
it and make sure you understand what it does.
- Verify that you can build and link the default (follow policy) TD-learning
program
-
$ make td
- Verify that the policy iteration program runs using a policy file
you copied (each entry corresponds to an action for the state).
-
$ ./td 0 4x3.mdp 10 < 4x3.policy
It should print zero-valued utilities for the states.
B: Q-Learning
- Open qlearn.c and locate the main function. Read
through it and make sure you understand what it does.
- Verify that you can build and link the default (random action) active
reinforcement Q-learning agent.
-
$ make qlearn
You can ignore the warning about line 13; gamma(3) is a deprecated
function and we're using gamma as a variable name. gcc
wants to make sure we know about that.
- Verify you can run the Q-learning agent using a policy file you copied
(each entry corresponds to an action for the state).
-
$ ./qlearn 0 1 1 4x3.mdp 10
It should print zero-valued Q-values for all 12 state/action pairs
and a first-action policy (with Xs in unreachable/terminal
states).
Lab Assignment
Between the last lab and the information in the assignment document,
you now have nearly all the pieces necessary to implement a passive
TD-learning agent and an active Q-learning agent. You should begin
to work on the lab assignment.
Copyright © 2011, 2013, 2015, 2018, 2020 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 4.0 International License.