Lab: Markov Decision Processes
CSC261 - Artificial Intelligence - Weinman
- Summary:
- We explore models for sequential decision making in
fully observable stochastic environments.
Preparation
- Create a directory and copy the starter files for the lab:
-
$ mkdir mdp
$ cp ~weinman/courses/CSC261/code/mdp/* mdp/
$ cd mdp
- Open the file start.c in Emacs or your favorite code editing
environment.
-
$ emacs start.c &
- Review the definition of the mdp struct in the lab assignment
or mdp.h.
Exercises
A: Getting started with MDPs
The shell of the main procedure in start.c reads
an mdp struct from a file and uses the p_mdp pointer
to print some basic information about the corresponding mdp.
- Build the start program (which also builds the mdp.o
object file)
-
$ make start
- The file 4x3.mdp contains the Markov decision process for
the 4×3 grid world environment defined in Figure 17.1 of AIMA.
Run the start program to read and print some details of this world.
-
$ ./start 4x3.mdp
- Note that 12 states are listed. This is because the world includes
"states" (such as the blacked out box) in a grid world, even
though they may not be reachable. Starting from the top-left most
corner at zero, the states are number by walking down the rows (top
to bottom), and then across the columns (left to right). Which state
number is the black box?
B: Understanding actions
- Add a line inside the loop over states that includes the number of
actions available in that state.
Note: You will need to use fields from the mdp struct,
outlined in the lab assignment and in mdp.h.
-
$ ./start
MDP Info for 4x3.mdp
Number of states: 12
Number of actions: 4
Available actions
State 0: [4]
State 1: [4]
...
- Add a line inside the loop to declare a new loop variable (an unsigned
int) for the available action.
- Add an inner for loop to loop over available actions, and
print the numeric value of the actions available in each state. For
instance, your output should look something like the following.
Note: You should NOT be printing out your loop variable; instead
you need to use the loop variable as an index into the actions
array of p_mdp.
-
$ ./start
MDP Info for 4x3.mdp
Number of states: 12
Number of actions: 4
Available actions
State 0: [4] 0 1 2 3
State 1: [4] 0 1 2 3
...
Note the actions are up=0, down=1, left=2, right=3.
C: Probing probabilities
In this portion of the lab, we will use the command line to probe
the transition probability distribution to other states given a particular
state and action.
- Copy the file start.c to transition.c.
-
$ cp start.c transition.c
- Add an entry to your Makefile to easily build a new program
-
transition: mdp
gcc ${FLAGS} -o transition transition.c mdp.o
- Alter the first part of the main procedure so that the expected
argument count is 4. The new usage will be
-
transition mdpfile state action
Alter the error message accordingly.
- Add declarations for the unsigned integers state and action.
- Convert the two additional command-line parameters to unsigned integers
using strtol. For example
-
(unsigned int)strtol(argv[2], NULL, 10)
For the purposes of this exercise, you do not need to verify that
the answers are valid.
- Modify the rest of the program to print the destination state t
and transition probability P(t|s,a) using s and a as given
in the command line for all states. For example:
-
$ ./transition 4x3.mdp 2 3
s' P(s' | s=2,a=3)
0 0.00
1 0.10
2 0.10
3 0.00
4 0.00
5 0.80
6 0.00
7 0.00
8 0.00
9 0.00
10 0.00
11 0.00
Lab Assignment
Between this lab and snippets in the assignment document, you now
have nearly all the pieces necessary to implement value iteration.
You should begin to work on the lab assignment.
Copyright © 2011 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.