Lab: Active Reinforcement Learning

CSC261 - Artificial Intelligence - Weinman



Summary:
We get smarter by learning utilities without a world model and get active by exploring and learning policies to go with them.

Preparation

  1. Move to your directory for materials from the prior lab.
    cd somewhere/mdp
  2. Copy starter materials for this lab (note the dot at the end of the command!). The -i flag asks to overwrite any files that may exist. If you made any changes to your Makefile you may want to preserve, rather than overwrite it; otherwise it is fine to overwrite the prior version.
    cp -i ~weinman/courses/CSC261/code/tdq/* .

Exercises

A: Temporal-Difference Learning

  1. Open td.c and locate the main function. Read through it and make sure you understand what it does.
  2. Verify that you can build and link the default (follow policy) TD-learning program
    make td
  3. Verify that the policy iteration program runs using a policy file you copied (each entry corresponds to an action for the state).
    ./td 0 4x3.mdp 10 < 4x3.policy
    It should print zero-valued utilities for the states.

B: Q-Learning

  1. Open qlearn.c and locate the main function. Read through it and make sure you understand what it does.
  2. Verify that you can build and link the default (random action) active reinforcement Q-learning agent.
    make qlearn
    You can ignore the warning about line 13; gamma(3) is a deprecated function and we're using gamma as a variable name. gcc wants to make sure we know about that.
  3. Verify you can run the Q-learning agent using a policy file you copied (each entry corresponds to an action for the state).
    ./qlearn 0 1 1 4x3.mdp 10 
    It should print zero-valued Q-values for all 12 state/action pairs and a first-action policy (with Xs in unreachable/terminal states).

Lab Assignment

Between the last lab and the information in the assignment document, you now have nearly all the pieces necessary to implement a passive TD-learning agent and an active Q-learning agent. You should begin to work on the lab assignment.
Copyright © 2011, 2013, 2015, 2018, 2020 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.