Lab: Adversarial Search

CSC261 - Artificial Intelligence - Weinman



Summary:
We explore MINIMAX search in preparation for implementing a cutoff a-b search and experiment with state evaluation functions.

Preparation

  1. Make a directory for this lab.
    mkdir somewhere/adversarial
    cd somewhere/adversarial
  2. Copy the starter files to your new direcdtory
    cp ~weinman/courses/CSC261/code/adversarial/*.scm .
  3. Open your Scheme environment and create a new Scheme file called prep.scm in your new directory.

Exercises

A: Getting started with Barranca and MINIMAX search

When we play the game, anyone using MINIMAX-DECISION needs a UTILITY function that takes a state as a parameter and produces a number. The state of the game is represented the same way for both players. Therefore, the utility function will need to distinguish whether a state is advantageous for the player or its opponent. Thus, in creating a utility we will specify who we are advocating for (that is, whose definition of utility, player 1's or player 2's).
In barranca, the state contains the player getting the next turn, the list of numbers each player has, and the list of remaining numbers. Thus, when evaluating utility, the procedure needs to know which list of player-possessing numbers belongs to it. In all the games provided (Barranca, Tic-Tac-Toe, Mancala), player 1 is represented by #t and player 2 represented by #f.
  1. Add commands to load some relevant starter files to your environment.
    (load "minimax.scm")
    (load  "barranca.scm")
  2. Create a barranca game with n=4 number possibilities and a target k=7.
    (define barranca (make-barranca-game 4 7))
  3. Use barranca-utility-fun (provided in barranca.scm) to create a utility for player 1 with a target of 7:
    (define barranca-player1-utility (barranca-utility-fun #t 7))
    This procedure gives 1 for wins, 0 for draws, and -1 for losses.
  4. Ignoring complete game play for a moment, ask the search routine to find us the best opening move (and its value) for player 1.
    (max-value barranca 
                 (game-start-state barranca)    
                 barranca-player1-utility)
    This produces the pair (action . utility).
  5. Based on this information, what number should player 1 choose? Which player should win this game?
  6. Usually we'll just want the optimal move, so ask for that more directly.
    (minimax-search barranca 
                      (game-start-state barranca)    
                      barranca-player1-utility)
    Does this give you what you expected?
  7. What's actually going on inside minimax-search?
    1. Open the file minimax.scm.
    2. Set DO-DISPLAY to #t,
    This will display a complete trace of the search tree.
  8. A barranca state is represented as a list of the form
    (next-player player-1-numbers player-2-numbers available-numbers)
    Re-run minimax search and trace the first ply of output.
    (minimax-search barranca 
                      (game-start-state barranca)    
                      barranca-player1-utility)
    If you are unable to make sense of the trace, you may want to read the end note.1
  9. Restore the value of DO-DISPLAY in minimax.scm to #f.

B: Moving up to tic-tac-toe

  1. Add the tic-tac-toe routines to the top of your Scheme file.
    (load "tictactoe.scm")
  2. Create a tic-tac-toe game. There are no free parameters in this game, so the call is straightforward.
    (define ttt (make-tictactoe-game))
  3. Just like barranca, a tic-tac-toe utility function has to know on whose behalf it is evaluating a state's utility The procedure tictactoe-utility-fun simply takes an argument indicating whether it needs to produce a procedure for player 1 (#t) or player 2 (#f). Going one step further than before, we use the utility function to create a player for the game playing engine that will be used to select moves.
    (define ttt-smart-player1 (make-minimax-player ttt (tictactoe-utility-fun #t)))
    (define ttt-smart-player2 (make-minimax-player ttt (tictactoe-utility-fun #f)))
  4. Of course, optimal play in tic-tac-toe should always lead to a draw. How can we pit these fearless competitors against each other? We simply use the game-play engine, giving it the game (so it knows the rules) and the two decision-making procedures.
    (game-play ttt ttt-smart-player1 ttt-smart-player2)
    After showing the initial board, it may take several seconds for a move to appear. Does the game play out the way you expect?
  5. Of course, the actions being reported may not be meaningful to you, but they correspond to the rules of the game, as internally represented. The slots on the 3×3 grid are simply numbered from top-to-bottom, left-to-right, starting at 0.
    036
    147
    258
    Verify the states displayed correspond to these action numbers.
  6. What if we don't want to have an optimal player? Create a remarkably simple agent decision function that always chooses the first possible action.
    (define ttt-lazy-player 
       (lambda (state)
          (caar ((game-successors-fun ttt) state))))
    The successor function produces a list of pairs. Because the first item in the list is a pair (first-action . resulting-state), taking the car of the first item gives us the actual action choice. (Combining the two leads to the caar shown above.) The astute reader will notice that we haven't checked for the case when there are no successors. We need not, since the game play engine will not ask us for a play when there is none to make.
  7. What do you expect to happen when our lazy agent squares off against the optimal player? Verify your prediction:
    (game-play ttt ttt-smart-player1 ttt-lazy-player)
  8. How important is this utility function anyhow? What if we gave player 1 the utility function used by an optimal player 2? Even if our latter player is extremely lazy, player 1 will in fact be choosing moves that benefit player 2. What do you expect to happen?
    (game-play ttt ttt-smart-player2 ttt-lazy-player)
    Don't be misled by the variable names. Player 2 is actually ttt-lazy-player.

C: Mancala

In mancala, the state is represented as a pair (next-player . board). The board is represented as a list of 14 holes, as follows. Indices 0-5 are the holes for player 1 with index 6 as player 1's mancala. Indices 7-12 are player 2's holes and index 13 is player 2's mancala. The figure on page one illustrates the indices along with the contents of each slot. This corresponds to the list
(4 4 4 4 4 4 0 4 4 4 4 4 4 0)
Because mancala has a moderately large branching factor (six early in the game), we may not always be able to search a game tree all the way to terminal states. As mentioned in the lab assignment, we can instruct the minimax search to cutoff at a particular depth (number of plies) and then apply an evaluation function for estimating the utility of a given game board. Let's see how to do all of that.
  1. Add the mancala routines and a simple evaluation function to the top of your Scheme file.
    (load "mancala.scm")
    (load "mancala-player.scm")
  2. Add the minimax with cutoff routines to the top of your Scheme file.
    (load "cutoff-minimax.scm")
  3. Create a mancala game.
    (define mancala (make-mancala-game))
  4. Use the naive evaluation function provided in mancala-player.scm, which simply counts the stones in our player's mancala.
    (define mancala-player1-eval (simple-mancala-eval #t))
  5. Sanity check this function by asking it to evaluate the initial state of the game. What result do you expect? Verify this result.
    (mancala-player1-eval (game-start-state mancala))
  6. Use this evaluation function to create a player that only searches, 3 plies deep in the tree.
    (define mancala-player1 (make-cutoff-minimax-player mancala 3 mancala-player1-eval))
  7. Ask the player what opening move it chooses.
    (mancala-player1 (game-start-state mancala))
  8. For comparison, create an even simpler (lazier) player that simply chooses the first action available
    (define mancala-lazy--player
       (lambda (state)
          ...)
  9. Use game-play to pit these two players, mancala-player1 and mancala-lazy--player against one another. Who do you expect to win?
  10. Use simple-mancala-eval to create an evaluation function for player 2, and make-cutoff-minimax-player to create a player that is allowed to search four plies deep, rather than the three plies player 1 was allowed.
    (define mancala-player2  ...)
  11. What will happens when these play each other? (Aside from the fact that player 2 will take exponentially longer). Use game-play to verify your prediction. (Keep reading the rest of the lab while you wait for the game to finish.)
You should note that going any deeper will make the search even more painfully slow. Hence, we will need to use alpha-beta pruning to ignore irrelevant sub-trees.

Lab Assignment

Between this lab and snippets in the assignment document, you now have nearly all the pieces necessary to implement your own evaluation functions and alpha beta pruning for these games.
Open minimax.scm and study the implementation of max-value. When you understand how this implements MAX-VALUE of AIMA Figure 5.3 (p. 166), proceed to begin the lab's assignment.

Copyright © 2011 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Footnotes:

1We start by following the alternating MAX and MIN nodes down the left side of the tree, as 1 is chosen by player 1, then 2 is chosen by player 2, 3 is then chosen by player 1, and finally 4 is chosen by player 2. Here the two products are 3 and 8. Because the move belongs to MAX (player 1) and there are no remaining possibilities, the utility is reported as -1 (3 is much farther from the goal of 7 than player 2's value of 8).
The c node above this then reports that its only option is to choose 4, and the utility is -1. The process is repeated for the case when the MAX node has chosen 4 as its second value, instead of three. This leaves player 2 ( MIN) with only one possibility, and the final MAX node with no remaining moves. The utility is again calculated, turning out again to be in favor of MIN (player 2). The MIN node reports that its only action is to choose 3, with a utility of -1. Just above this in the tree, the MAX node now has a choice between 3 and 4, but both of these have a utility of -1.