Re: Figure C4 question
"Figure C4 show the results of the splits of the original information. However, I noticed that ball #5 is listed under the size tree as Medium, when the table above lists it as "Large"."
You are absolutely right. It is listed as large in the table and medium in the image. You might also notice that 3, 6 and 7 are listed as medium in the table and large in the image. I must have absentmindedly swapped the labels (I'll send a correct version to Alex after I post this message). That doesnt change the tree, just the labeling. Looking again, I notice I made a similar mistake on some others. Again, the information is correct, just a slight error in the image creation.
"Additionally, #5 is listed as being made of rubber, but with the way the tree is drawn it can't be listed under the rubber node."
We create the tree using a greedy method by finding the choice with the least disorder. We create one small tree, with root node representing the category (size, color, weight, rubber?) and children nodes representing each possible value. So, since we have four possible categories: size, color, weight, rubber? we create four individual trees. Of those, the one which best divides the data is chosen (size). From there, we look at the size tree. Medium and large, despite their labels having been accidentally switched, are perfectly divided into homogenous subsets. The small branch, however, is not and has two of each outcome: two bounce and two do not. Therefore, we need only divide that branch of the tree with the test that creates the most homogenous subsets (in this case, rubber?) It is true that 5 is rubber, but it is not listed under the rubber node because it isn't needed. 5 is already part of a homogenous subset, made up of itself.
"Does this mean that ID trees should be built with a specific "target" in mind? (Obviously a targeted tree will be optimised for the specific data it was built for, but then it looses some of its generality.)"
ID trees arent built with a target in mind, but they are built with the intent to divide the possible targets into distinct groups. The goal of an ID tree is to have some number of leaf node outcomes, each representing a group of outcomes from the data. In these groups, there should be only one type of outcome. So, since our outcome in my example is to test if it bounces or not, each group should only contain "yes" results or "no" results.
I hope this helps explain some of your questions and if you have any more I'll do my best to answer them. I appologize if this response rambled a bit, I'm running on very little sleep right now ;)
|