Lab: Decision Trees
CSC261 - Artificial Intelligence - Weinman
- Summary:
- We investigate the building blocks for creating decision
trees.
Preparation
- Create a directory and copy the starter files for the decision tree
lab:
-
$ cp -R ~weinman/courses/CSC261/code/dtree somewhere
$ cd somewhere/dtree
- Open the starter file with DrRacket:
-
$ drracket start.scm &
Reference
Here is a summary of the procedures you will use to investigate the
data and (eventually) construct decision trees.
- (filter-examples-by-attribute-value examples attribute value)
Takes a list of examples, an attribute,
and a particular value of that attribute, returning
a list of only the examples where the atttribute takes on that value.
- (information-gain examples candidate attributes)
Given a list of examples, where candidate
is some attribute and attributes is an association
list giving the domains (possible values) of attributes
in examples, calculates the empirical numerical "information"
gained (i.e., reduction in uncertainty) by querying the particular
candidate attribute.
- (label-counts examples)
Builds an association list of counts for each class label in examples;
a histogram.
- (plurality-value examples)
Returns the class label in examples that occurs with
the greatest frequency; the mode.
Exercises
A: Attributes
- Note that the starter file loads the mushroom examples and attributes
for you. The attributes are an association list, which is a data structure
of the form
-
(list
(cons key1 ____)
...
(cons keyN ____))
Write a Scheme expression to extract the list of attribute names (i.e.,
the keys) from mushroom-attributes.
- Recall the procedure (assoc key alist)
extracts the entire record associated with key in
alist. Write a Scheme expression to extract the list
of values corresponding to the domain for an attribute you find interesting.
(Make sure your result omits the attribute name.)
B: Labels
- Note that the starter file has loaded the mushroom-examples
for you. Write an expression using label-counts that gives
you the total number of poisonous and edible mushroom instances among
the examples.
- If you had no attributes of the mushrooms to measure, what prediction
would you make (ignoring the less-than-ideal of utility of perhaps
making the wrong decision)? Verify that plurality-value reports
the same prediction.
- The attribute "stalk-shape" has two possible values.
Write an expression (as in Exercise A.2)
to determine what they are.
- Using the procedure filter-examples-by-attribute-value, write
expressions to create lists of examples containing only these two
values
-
(define examples-stalk-val1 ______)
(define examples-stalk-val2 ______)
- Find the counts of each label (poisonous and edible) for each subset
of examples you just defined.
- Qualitatively speaking, how much does knowing this attribute tell
you about the classification of the mushroom?
- Repeat the process from the previous exercises (B.3-5)
to find the counts of the labels for each of the two values of the
"gill-size" attribute. (Note: If you are adventurous
or clever, you could do this succinctly in a single expression by
using map with an an anonymous procedure.)
C: Information
- Qualitatively, which attribute, "stalk-shape" or "gill-size"
seems to give you more information about the classification of a mushroom?
- Verify your intuition by using the procedure information-gain
with both "stalk-shape" or "gill-size" as
the candidate attribute. Which attribute in fact gives more information?
D: Building decision trees
- Using the structural definition given in the lab assignment, construct
a one-test decision tree using the best attribute (of the two you
have measured). Use the character value #\p
for the class poisonous and the character value #\e
for edible.
-
(define decision-stump ________)
- Write an expression that uses your decision tree definition above
and an arbitrary instance and gives the classification for that instance.
For example, given the definition of decision-stump and the
following definition
-
(define first-instance (cdar mushroom-examples))
write a Scheme expression involving decision-stump and first-instance
that produces a class label from decision-stump.
Lab Assignment
Between this lab and snippets in the assignment document, you now
have nearly all the pieces necessary to implement your decision tree
learner.
Open learning.scm and study the documentation for choose-attribute
and decision-tree-learning. When these are clear, proceed
to implement them.
Copyright © 2011, 2013, 2015, 2018 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 4.0 International License.