CTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
No Title
Lab: Decision Trees
CSC 261 - Weinman
- Summary:
- You will investigate the building blocks for creating
decision trees.
Preparation
- If you have not already done so, copy the starter files for the decision
tree assignment to your own MathLAN account:
-
cp -R /home/weinman/courses/CSC261/code/dtree your-directory
- Change to the directory you copied the files to:
-
cd your-directory/dtree
- Open the starter file with DrScheme:
-
drscheme start.scm &
Reference
Here is a summary of the procedures you will use to investigate the
data and (eventually) construct decision trees.
- (filter-examples-by-attribute-value examples attribute value)
- Takes
a list of examples, an attribute,
and a particular value of that attribute, returning
a list of only the examples where the atttribute takes on that value.
- (information-gain examples candidate attributes)
- Given
a list of examples, where candidate
is some attribute and attributes is an association
list giving the domains (possible values) of all attributes in examples.
- (label-counts examples)
- Build
an association list of counts for each class label in examples--a
histogram.
- (majority-value examples)
- Returns
the class label in examples that occurs with the greatest
frequency.
Exercises
In doing these exercises, you are strongly encouraged to add and comment
your work in the file start.scm in your definitions pane
so that you have a record to refer to as you move from the lab to
the assignment.
- Note that the starter file loads the mushroom examples and attributes
for you. The attributes are an association list, which is a data structure
of the form
-
(list
(cons key1 ____)
...
(cons keyN ____))
Write a Scheme expression to extract the list of attribute names (i.e.,
the keys) from mushroom-attributes.
- Recall the procedure (assoc key alist)
extracts the entire record associated with key in
alist. Write a Scheme expression to extract the list
of values corresponding to the domain for an attribute you find interesting.
- Note that the starter file has loaded the mushroom-examples
for you. Write an expression that gives you the count of poisonous
and edible mushrooms among the examples.
- If you had no attributes of the mushrooms to measure, what prediction
would you make (ignoring the less-than-ideal of utility of perhaps
making the wrong decision)? Verify that majority-value makes
the same prediction.
- The attribute "stalk-shape" has two possible values.
Write an expression (as in Exercise 2
*
) to determine what they are.
- Write expressions to create lists of examples containing only these
two values (refer to the Reference section above).
-
(define examples-stalk-val1 ______)
(define examples-stalk-val2 ______)
- Find the counts of each label (poisonous and edible) for each list
of examples you just defined. Qualitatively, how much knowing this
attribute tell you about the classification of the mushroom?
- Repeat the process from the previous two exercises to find the counts
of the labels for each of the two values of the "gill-size"
attribute. (Note: If you are adventurous or clever, you could
do this succinctly in a single expression by using map
with an an anonymous procedure.)
- Qualitatively, which attribute, "stalk-shape" or "gill-size"
seems to give you more information about the classification of a mushroom?
- Verify your intuition by using the procedure information-gain
with both "stalk-shape" or "gill-size"
as a candidate. Which attribute gives more information?
- Using the structural definition given in Assignment 6, construct a
one-test decision tree using the best attribute (of the two you have
measured). Use #p for the class poisonous
and #e for edible.
-
(define dtree ________)
- Write an expression that uses your decision tree definition dtree
and an arbitrary instance and gives the classification for that instance.
For example, given the definition above and the following,
-
(define instance (cdr mushroom-examples))
write a Scheme expression involving dtree and instance
that produces a class label from dtree.
- Start working on choose-attribute, (then decision-tree-learning,
and decision-tree-classify) for Assignment 6.