Classification Decision Tree

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-49D175D7-7837-4731-A9ED-DA88E9B0E578

View Details

Classification Decision Tree

Classification decision tree is a kind of a decision tree described in Decision Tree.

Details

Given:

n feature vectors of size p
The vector of class labels that describes the class to which the feature vector belongs, where and C is the number of classes.

The problem is to build a decision tree classifier.

Split Criteria

The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:

Gini index

where
- D is a set of observations that reach the node
- is the observed fraction of observations with class i in D
To find the best test using Gini index, each possible test is examined using

where
- is the set of all possible outcomes of test
- is the subset of D, for which outcome of is v, for example
The test to be used in the node is selected as . For binary decision tree with ‘true’ and ‘false’ branches,
Information gain

where
- , D, are defined above
- , with defined above in Gini index.
Similarly to Gini index, the test to be used in the node is selected as . For binary decision tree with ‘true’ and ‘false’ branches,

Training Stage

The classification decision tree follows the algorithmic framework of decision tree training described in Decision Tree.

Prediction Stage

The classification decision tree follows the algorithmic framework of decision tree prediction described in Decision Tree.

Given decision tree and vectors , the problem is to calculate the responses for those vectors.

Batch Processing

Decision tree classification follows the general workflow described in Classification Usage Model.

Training

In addition to common input for a classifier, decision trees can accept the following inputs that are used for post-pruning:

Training Input for Decision Tree Classification (Batch Processing)
Input ID	Input
`dataForPruning`	Pointer to the numeric table with the pruning data set. This table can be an object of any class derived from NumericTable.
`labelsForPruning`	Pointer to the numeric table with class labels. This table can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.

At the training stage, decision tree classifier has the following parameters:

Training Parameters for Decision Tree Classification (Batch Processing)
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	The computation method used by the decision tree classification. The only training method supported so far is the default dense method.
`nClasses`	Not applicable	The number of classes. A required parameter.
`splitCriterion`	`infoGain`	Split criterion to choose the best test for split nodes. Available split criteria for decision trees: `gini` - the Gini index `infoGain` - the information gain
`pruning`	`reducedErrorPruning`	Method to perform post-pruning. Available options for the pruning parameter: `reducedErrorPruning` - reduced error pruning. Provide dataForPruning and labelsForPruning inputs, if you use pruning. `none` - do not prune.
`maxTreeDepth`	0	Maximum tree depth. Zero value means unlimited depth. Can be any non-negative number.
`minObservationsInLeafNodes`	1	Minimum number of observations in the leaf node. Can be any positive number.

Prediction

At the prediction stage, decision tree classifier has the following parameters:

Prediction Parameters for Decision Tree Classification (Batch Processing)
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	The computation method used by the decision tree classification. The only training method supported so far is the default dense method.

Examples

C++ (CPU)

Batch Processing:

dt_cls_dense_batch.cpp

Python*

Batch Processing:

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Classification Decision Tree

Details

Batch Processing