Lab: Markov Decision Processes

CSC261 - Artificial Intelligence - Weinman

Summary:: We explore models for sequential decision making in fully observable stochastic environments.

Preparation

Create a directory and copy the starter files for the lab:

$ mkdir mdp
$ cp ~weinman/courses/CSC261/code/mdp/* mdp/
$ cd mdp
Open the file start.c in Emacs or your favorite code editing environment.

$ emacs start.c &
Review the definition of the mdp struct in the lab assignment or mdp.h.

Exercises

A: Getting started with MDPs

The shell of the main procedure in start.c reads an mdp struct from a file and uses the p_mdp pointer to print some basic information about the corresponding mdp.

Build the start program (which also builds the mdp.o object file)

$ make start
The file 4x3.mdp contains the Markov decision process for the 4×3 grid world environment defined in Figure 17.1 of AIMA. Run the start program to read and print some details of this world.

$ ./start 4x3.mdp
Note that 12 states are listed. This is because the world includes "states" (such as the blacked out box) in a grid world, even though they may not be reachable. Starting from the top-left most corner at zero, the states are number by walking down the rows (top to bottom), and then across the columns (left to right). Which state number is the black box?

B: Understanding actions

Add a line inside the loop over states that includes the number of actions available in that state.
Note: You will need to use fields from the mdp struct, outlined in the lab assignment and in mdp.h.

$ ./start
MDP Info for 4x3.mdp
Number of states: 12
Number of actions: 4
Available actions
State 0: [4]
State 1: [4]
...
Add a line inside the loop to declare a new loop variable (an unsigned int) for the available action.
Add an inner for loop to loop over available actions, and print the numeric value of the actions available in each state. For instance, your output should look something like the following.
Note: You should NOT be printing out your loop variable; instead you need to use the loop variable as an index into the actions array of p_mdp.

$ ./start
MDP Info for 4x3.mdp
Number of states: 12
Number of actions: 4
Available actions
State 0: [4] 0 1 2 3
State 1: [4] 0 1 2 3
...

Note the actions are up=0, down=1, left=2, right=3.

C: Probing probabilities

In this portion of the lab, we will use the command line to probe the transition probability distribution to other states given a particular state and action.

Copy the file start.c to transition.c.

$ cp start.c transition.c
Add an entry to your Makefile to easily build a new program

transition: mdp
gcc ${FLAGS} -o transition transition.c mdp.o
Alter the first part of the main procedure so that the expected argument count is 4. The new usage will be

transition mdpfile state action

Alter the error message accordingly.
Add declarations for the unsigned integers state and action.
Convert the two additional command-line parameters to unsigned integers using strtol. For example

(unsigned int)strtol(argv[2], NULL, 10)

For the purposes of this exercise, you do not need to verify that the answers are valid.
Modify the rest of the program to print the destination state t and transition probability P(t|s,a) using s and a as given in the command line for all states. For example:

$ ./transition 4x3.mdp 2 3
s'    P(s' | s=2,a=3)
0     0.00
1     0.10
2     0.10
3     0.00
4     0.00
5     0.80
6     0.00
7     0.00
8     0.00
9     0.00
10    0.00
11    0.00

Lab Assignment

Between this lab and snippets in the assignment document, you now have nearly all the pieces necessary to implement value iteration. You should begin to work on the lab assignment.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.