Lab: Decision Trees

CSC261 - Artificial Intelligence - Weinman



Summary:
We investigate the building blocks for creating decision trees.

Preparation

  1. Create a directory and copy the starter files for the decision tree lab:
    cp -R ~weinman/courses/CSC261/code/dtree somewhere
    cd somewhere/dtree
  2. Open the starter file with DrRacket:
    drracket start.scm &

Reference

Here is a summary of the procedures you will use to investigate the data and (eventually) construct decision trees.

Exercises

A: Attributes

  1. Note that the starter file loads the mushroom examples and attributes for you. The attributes are an association list, which is a data structure of the form
    (list
       (cons key1 ____)
       ...
       (cons keyN ____))
    Write a Scheme expression to extract the list of attribute names (i.e., the keys) from mushroom-attributes.
  2. Recall the procedure (assoc key alist) extracts the entire record associated with key in alist. Write a Scheme expression to extract the list of values corresponding to the domain for an attribute you find interesting. (Make sure your result omits the attribute name.)

B: Labels

  1. Note that the starter file has loaded the mushroom-examples for you. Write an expression using label-counts that gives you the total number of poisonous and edible mushroom instances among the examples.
  2. If you had no attributes of the mushrooms to measure, what prediction would you make (ignoring the less-than-ideal of utility of perhaps making the wrong decision)? Verify that plurality-value reports the same prediction.
  3. The attribute "stalk-shape" has two possible values. Write an expression (as in Exercise A.2) to determine what they are.
  4. Using the procedure filter-examples-by-attribute-value, write expressions to create lists of examples containing only these two values
    (define examples-stalk-val1 ______)
    (define examples-stalk-val2 ______)
  5. Find the counts of each label (poisonous and edible) for each subset of examples you just defined.
  6. Qualitatively speaking, how much does knowing this attribute tell you about the classification of the mushroom?
  7. Repeat the process from the previous exercises (B.3-5) to find the counts of the labels for each of the two values of the "gill-size" attribute. (Note: If you are adventurous or clever, you could do this succinctly in a single expression by using map with an an anonymous procedure.)

C: Information

  1. Qualitatively, which attribute, "stalk-shape" or "gill-size" seems to give you more information about the classification of a mushroom?
  2. Verify your intuition by using the procedure information-gain with both "stalk-shape" or "gill-size" as the candidate attribute. Which attribute in fact gives more information?

D: Building decision trees

  1. Using the structural definition given in the lab assignment, construct a one-test decision tree using the best attribute (of the two you have measured). Use the character value #\p for the class poisonous and the character value #\e for edible.
    (define decision-stump ________)
  2. Write an expression that uses your decision tree definition above and an arbitrary instance and gives the classification for that instance. For example, given the definition of decision-stump and the following definition
    (define first-instance (cdar mushroom-examples))
    write a Scheme expression involving decision-stump and first-instance that produces a class label from decision-stump.

Lab Assignment

Between this lab and snippets in the assignment document, you now have nearly all the pieces necessary to implement your decision tree learner.
Open learning.scm and study the documentation for choose-attribute and decision-tree-learning. When these are clear, proceed to implement them.
Copyright © 2011, 2013, 2015, 2018 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.