CTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> No Title
Lab: Decision Trees
CSC 261 - Weinman


Summary:
You will investigate the building blocks for creating decision trees.

Preparation

  1. If you have not already done so, copy the starter files for the decision tree assignment to your own MathLAN account:
    cp -R /home/weinman/courses/CSC261/code/dtree your-directory
  2. Change to the directory you copied the files to:
    cd your-directory/dtree
  3. Open the starter file with DrScheme:
    drscheme start.scm &

Reference

Here is a summary of the procedures you will use to investigate the data and (eventually) construct decision trees.
(filter-examples-by-attribute-value examples attribute value)
Takes a list of examples, an attribute, and a particular value of that attribute, returning a list of only the examples where the atttribute takes on that value.
(information-gain examples candidate attributes)
Given a list of examples, where candidate is some attribute and attributes is an association list giving the domains (possible values) of all attributes in examples.
(label-counts examples)
Build an association list of counts for each class label in examples--a histogram.
(majority-value examples)
Returns the class label in examples that occurs with the greatest frequency.

Exercises

In doing these exercises, you are strongly encouraged to add and comment your work in the file start.scm in your definitions pane so that you have a record to refer to as you move from the lab to the assignment.
  1. Note that the starter file loads the mushroom examples and attributes for you. The attributes are an association list, which is a data structure of the form
    (list
       (cons key1 ____)
       ...
       (cons keyN ____))
    Write a Scheme expression to extract the list of attribute names (i.e., the keys) from mushroom-attributes.
  2. Recall the procedure (assoc key alist) extracts the entire record associated with key in alist. Write a Scheme expression to extract the list of values corresponding to the domain for an attribute you find interesting.
  3. Note that the starter file has loaded the mushroom-examples for you. Write an expression that gives you the count of poisonous and edible mushrooms among the examples.
  4. If you had no attributes of the mushrooms to measure, what prediction would you make (ignoring the less-than-ideal of utility of perhaps making the wrong decision)? Verify that majority-value makes the same prediction.
  5. The attribute "stalk-shape" has two possible values. Write an expression (as in Exercise 2 * ) to determine what they are.
  6. Write expressions to create lists of examples containing only these two values (refer to the Reference section above).
    (define examples-stalk-val1 ______)
    (define examples-stalk-val2 ______)
  7. Find the counts of each label (poisonous and edible) for each list of examples you just defined. Qualitatively, how much knowing this attribute tell you about the classification of the mushroom?
  8. Repeat the process from the previous two exercises to find the counts of the labels for each of the two values of the "gill-size" attribute. (Note: If you are adventurous or clever, you could do this succinctly in a single expression by using map with an an anonymous procedure.)
  9. Qualitatively, which attribute, "stalk-shape" or "gill-size" seems to give you more information about the classification of a mushroom?
  10. Verify your intuition by using the procedure information-gain with both "stalk-shape" or "gill-size" as a candidate. Which attribute gives more information?
  11. Using the structural definition given in Assignment 6, construct a one-test decision tree using the best attribute (of the two you have measured). Use #p for the class poisonous and #e for edible.
    (define dtree ________)
  12. Write an expression that uses your decision tree definition dtree and an arbitrary instance and gives the classification for that instance. For example, given the definition above and the following,
    (define instance (cdr mushroom-examples))
    write a Scheme expression involving dtree and instance that produces a class label from dtree.
  13. Start working on choose-attribute, (then decision-tree-learning, and decision-tree-classify) for Assignment 6.