Lab: Markov Decision Processes

CSC261 - Artificial Intelligence - Weinman



Summary:
We explore models for sequential decision making in fully observable stochastic environments.

Preparation

  1. Create a directory and copy the starter files for the lab:
    cp -R ~weinman/courses/CSC261/code/mdp somewhere/
    cd somewhere/mdp
  2. Open the file start.c in Emacs or your favorite code editing environment.
    emacs start.c &
  3. Review the definition of the mdp struct in the lab assignment or mdp.h.

Exercises

A: Getting started with MDPs

The shell of the main procedure in start.c reads an mdp struct from a file and uses the p_mdp pointer to print some basic information about the corresponding mdp.
  1. Build the start program and the mdp.o object file
    make mdp start
  2. The file 4x3.mdp contains the Markov decision process for the 4×3 grid world environment defined in Figure 17.1 of AIMA (p. 646). Run the start program to read and print some details of this world.
    ./start 4x3.mdp
  3. Note that 12 states are listed. This is because the world includes "states" (such as the blacked out box) in a grid world, even though they may not be reachable. Starting from the top-left most corner at zero, the states are numbered by walking down the rows (top to bottom), and then across the columns (left to right). Which state number is the black box?

B: Understanding actions

  1. Add a line inside the loop over states that includes the number of actions available in that state.
    Note: You will need to use the numActions field of the mdp struct to identify how many actions are available, as outlined in the lab assignment (see "The Markov Decision Process Structure") and in mdp.h.
    make start
    ./start
    MDP Info for 4x3.mdp 
    Number of states: 12 
    Number of actions: 4 
    Available actions 
    State 0: [4]  
    State 1: [4]  
    ...
  2. Add a line inside the loop to declare a new loop variable action (an unsigned int) for the available action.
  3. Add an inner for loop to loop over available actions, and print the numeric value of the actions available in each state. For instance, your output should look something like the following.
    Note: You should NOT be printing out your loop variable; instead you need to use the loop variable as an index into the actions[state][action] array of p_mdp.
    ./start
    MDP Info for 4x3.mdp 
    Number of states: 12 
    Number of actions: 4 
    Available actions 
    State 0: [4] 0 1 2 3
    State 1: [4] 0 1 2 3
    ...
    Note the actions are up=0, down=1, left=2, right=3.
    Should you choose to implement calc_meu, this exercise will prove quite helpful.

C: Probing probabilities

In this portion of the lab, we will use the command line to probe the transition probability distribution to other states given a particular state and action.
  1. Copy the file start.c to transition.c.
    cp start.c transition.c
  2. Add an entry to your Makefile to easily build a new program
    transition: mdp
         gcc ${FLAGS} -o transition transition.c mdp.o
  3. Alter the first part of the main procedure in transition.c so that the expected argument count is 4. The new usage will be
    transition mdpfile state action
    Alter the error message accordingly.
  4. Add declarations for the unsigned integers state and action, which will be command-line parameters.
  5. Convert the two additional command-line parameters to unsigned integers using strtol. For example
    (unsigned int)strtol(argv[2], NULL, 10)
    For the purposes of this exercise, you do not need to verify that the answers are valid (though see main in value_iteration.c for an example of how to do so).
  6. Modify the rest of the program to print the destination state t and transition probability P(t|s,a) using s and a as given in the command line for all states. For example:
    ./transition 4x3.mdp 2 3 
    s'    P(s' | s=2,a=3) 
    0     0.00 
    1     0.10 
    2     0.10 
    3     0.00 
    4     0.00 
    5     0.80 
    6     0.00 
    7     0.00
    8     0.00
    9     0.00
    10    0.00
    11    0.00

Lab Assignment

Between this lab and snippets in the assignment document, you now have nearly all the pieces necessary to implement value and policy iteration. You should begin to work on the lab assignment.
Copyright © 2011, 2013, 2015 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.