Lab: Passive Reinforcement Learning
CSC261 - Artificial Intelligence - Weinman
- Summary:
- We prepare to cut to the chase by learning policies
as well as how to learn world models when they are unknown.
Preparation
- Move to your directory for materials from the prior lab.
-
$ cd somewhere/mdp
- Copy starter materials for this lab (note the dot at the end of the
command!). The -i flag asks to overwrite any files that may
exist. If you made any changes to your Makefile you may want
to save them, otherwise it is fine to overwrite the prior version.
Object (.o) files will be copied as well. If you are unsure
whether yours are functional, you should overwrite your own.
-
$ cp -i ~weinman/courses/CSC261/code/adp/* .
Exercises
A: Policy Iteration
- Open policy_iteration.c and locate the main function.
Read through it and make sure you understand what it does.
- Verify that you can build and link the default (do-nothing) policy
iteration program
-
$ make mdp policy
- Verify that the policy iteration program runs.
-
$ ./policy_iteration 0 0 4x3.mdp
It should print a random policy for 12 states consisting of the four
movement actions 0-3.
B: Adaptive Dynamic Programming
- Open adp.c and locate the main function. Read through
it and make sure you understand what it does.
- Verify that you can build and link the default (random action) passive
reinforcement learning agent.
-
$ make adp
You can ignore the warning about line 14; gamma(3)
is a deprecated function and we're using gamma as a variable
name. gcc wants to make sure we know about that.
- Verify you can run the ADP agent using a policy file you copied (each
entry corresponds to an action for the state).
-
$ ./adp 0 0 4x3.mdp 10 < 4x3.policy
It should print zero-valued utilities for all 12 states.
Lab Assignment
Between the last lab and the information in the assignment document,
you now have nearly all the pieces necessary to implement policy evaluation,
policy iteration, and a passive ADP agent. You should begin to work
on the lab assignment.
Copyright © 2011, 2013 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.