Decision Forest Classification and Regression (DF)

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-8F055EE6-D0D0-4844-AA5A-DE94214D85A2

View Details

Decision Forest Classification and Regression (DF)

Decision Forest (DF) classification and regression algorithms are based on an ensemble of tree-structured classifiers, which are known as decision trees. Decision forest is built using the general technique of bagging, a bootstrap aggregation, and a random choice of features. For more details, see [Breiman84] and [Breiman2001].

Operation	Computational methods		Programming Interface
Training	Dense	Hist	train(…)	train_input	train_result
Inference	Dense	Hist	infer(…)	infer_input	infer_result

Mathematical formulation

Refer to Developer Guide: Decision Forest Classification and Regression.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::decision_forest namespace and are available via inclusion of the oneapi/dal/algo/decision_forest.hpp header file.

Enum classes

error_metric_mode

error_metric_mode::none: Do not compute error metric.
error_metric_mode::out_of_bag_error: Train produces table with cumulative prediction error for out of bag observations.
error_metric_mode::out_of_bag_error_per_observation: Train produces table with prediction error for out-of-bag observations.

variable_importance_mode

variable_importance_mode::none: Do not compute variable importance.
variable_importance_mode::mdi: Mean Decrease Impurity. Computed as the sum of weighted impurity decreases for all nodes where the variable is used, averaged over all trees in the forest.
variable_importance_mode::mda_raw: Mean Decrease Accuracy (permutation importance). For each tree, the prediction error on the out-of-bag portion of the data is computed (error rate for classification, MSE for regression). The same is done after permuting each predictor variable. The difference between the two are then averaged over all trees.
variable_importance_mode::mda_scaled: Mean Decrease Accuracy (permutation importance). This is MDA_Raw value scaled by its standard deviation.

infer_mode

infer_mode::class_labels: Infer produces a “math:n times 1 table with the predicted labels.
infer_mode::class_responses: deprecated
infer_mode::class_probabilities: Infer produces table with the predicted class probabilities for each observation.

voting_mode

voting_mode::weighted: The final prediction is combined through a weighted majority voting.
voting_mode::unweighted: The final prediction is combined through a simple majority voting.

splitter_mode

splitter_mode::best: The best splitting strategy chooses the best threshold for each feature while building trees in terms of impurity among all histogram bins and feature subsets.
splitter_mode::random: The random splitting strategy chooses a random threshold for each feature while building trees and selects the best feature in terms of impurity computed for that random split from the feature subsets.

Descriptor

template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default>classdescriptor

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::dense or method::hist.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::regression.

Constructors

descriptor()=default

Creates a new instance of the class with the default property values.

Properties

std::int64_tmin_observations_in_leaf_node

The minimal number of observations in a leaf node. Default value: 1 for classification, 5 for regression.

Getter & Setter: std::int64_t get_min_observations_in_leaf_node() const
auto & set_min_observations_in_leaf_node(std::int64_t value)
Invariants: min_observations_in_leaf_node > 0

std::int64_tclass_count

The class count. Used with task::classification only. Default value: 2.

Getter & Setter: template <typename T = Task, typename None = detail::enable_if_classification_t<T>> std::int64_t get_class_count() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_class_count(std::int64_t value)

std::int64_tmin_bin_size

The minimal number of observations in a bin. Used with method::hist split-finding method only. Default value: 5.

Getter & Setter: std::int64_t get_min_bin_size() const
auto & set_min_bin_size(std::int64_t value)
Invariants: min_bin_size > 0

doubleobservations_per_tree_fraction

The fraction of observations per tree. Default value: 1.0.

Getter & Setter: double get_observations_per_tree_fraction() const
auto & set_observations_per_tree_fraction(double value)
Invariants: observations_per_tree_fraction  >  0.0
observations_per_tree_fraction  <=  1.0

std::int64_ttree_count

The number of trees in the forest. Default value: 100.

Getter & Setter: std::int64_t get_tree_count() const
auto & set_tree_count(std::int64_t value)
Invariants: tree_count > 0

boolmemory_saving_mode

The memory saving mode. Default value: false.

Getter & Setter: bool get_memory_saving_mode() const
auto & set_memory_saving_mode(bool value)

variable_importance_modevariable_importance_mode

The variable importance mode. Default value: variable_importance_mode::none.

Getter & Setter: variable_importance_mode get_variable_importance_mode() const
auto & set_variable_importance_mode(variable_importance_mode value)

std::int64_tmax_bins

The maximal number of discrete bins to bucket continuous features. Used with method::hist split-finding method only. Increasing the number results in higher computation costs. Default value: 256.

Getter & Setter: std::int64_t get_max_bins() const
auto & set_max_bins(std::int64_t value)
Invariants: max_bins > 1

boolbootstrap

The bootstrap mode, if true, the training set for a tree is a bootstrap of the whole training set, if False, the whole dataset is used to build each tree. Default value: true.

Getter & Setter: bool get_bootstrap() const
auto & set_bootstrap(bool value)

infer_modeinfer_mode

The infer mode. Used with task::classification only.

Getter & Setter: template <typename T = Task, typename None = detail::enable_if_classification_t<T>> infer_mode get_infer_mode() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_infer_mode(infer_mode value)

doublemin_weight_fraction_in_leaf_node

The min weight fraction in a leaf node. The minimum weighted fraction of the total sum of weights (of all input observations) required to be at a leaf node. Default value: 0.0.

Getter & Setter: double get_min_weight_fraction_in_leaf_node() const
auto & set_min_weight_fraction_in_leaf_node(double value)
Invariants: min_weight_fraction_in_leaf_node  >=  0.0
min_weight_fraction_in_leaf_node  <=  0.5

doublemin_impurity_decrease_in_split_node

The min impurity decrease in a split node is a threshold for stopping the tree growth early. A node will be split if its impurity is above the threshold, otherwise it is a leaf. Default value: 0.0.

Getter & Setter: double get_min_impurity_decrease_in_split_node() const
auto & set_min_impurity_decrease_in_split_node(double value)
Invariants: min_impurity_decrease_in_split_node >= 0.0

std::int64_tmax_tree_depth

The maximal depth of the tree. If 0, then nodes are expanded until all leaves are pure or until all leaves contain less or equal to min observations in leaf node samples. Default value: 0.

Getter & Setter: std::int64_t get_max_tree_depth() const
auto & set_max_tree_depth(std::int64_t value)

std::int64_tseed

Seed for the random numbers generator used by the algorithm.

Getter & Setter: std::int64_t get_seed() const
auto & set_seed(std::int64_t value)
Invariants: tree_count > 0

std::int64_tmin_observations_in_split_node

The minimal number of observations in a split node. Default value: 2.

Getter & Setter: std::int64_t get_min_observations_in_split_node() const
auto & set_min_observations_in_split_node(std::int64_t value)
Invariants: min_observations_in_split_node > 1

error_metric_modeerror_metric_mode

The error metric mode. Default value: error_metric_mode::none.

Getter & Setter: error_metric_mode get_error_metric_mode() const
auto & set_error_metric_mode(error_metric_mode value)

doubleimpurity_threshold

The impurity threshold, a node will be split if this split induces a decrease of the impurity greater than or equal to the input value. Default value: 0.0.

Getter & Setter: double get_impurity_threshold() const
auto & set_impurity_threshold(double value)
Invariants: impurity_threshold >= 0.0

splitter_modesplitter_mode

Splitter strategy: if ‘best’, best threshold for each is selected. If ‘random’, threshold is selected randomly. Default value: splitter_mode::best.

Getter & Setter: splitter_mode get_splitter_mode() const
auto & set_splitter_mode(splitter_mode value)

std::int64_tmax_leaf_nodes

The maximal number of the leaf nodes. If 0, the number of leaf nodes is not limited. Default value: 0.

Getter & Setter: std::int64_t get_max_leaf_nodes() const
auto & set_max_leaf_nodes(std::int64_t value)

std::int64_tfeatures_per_node

The number of features to consider when looking for the best split for a node. Default value: task::classification ? sqrt(p) : p/3, where p is the total number of features.

Getter & Setter: std::int64_t get_features_per_node() const
auto & set_features_per_node(std::int64_t value)

voting_modevoting_mode

The voting mode. Used with task::classification only.

Getter & Setter: template <typename T = Task, typename None = detail::enable_if_classification_t<T>> voting_mode get_voting_mode() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_voting_mode(voting_mode value)

Method tags

structdense

Tag-type that denotes dense computational method.

structhist

Tag-type that denotes hist computational method.

usingby_default=dense

Alias tag-type for dense computational method.

Task tags

structclassification

Tag-type that parameterizes entities used for solving classification problem.

structregression

Tag-type that parameterizes entities used for solving regression problem.

usingby_default=classification

Alias tag-type for classification task.

Model

template<typenameTask=task::by_default>classmodel

Template Parameters: Task – Tag-type that specifies the type of the problem to solve. Can be task::classification or task::regression.

Constructors

model()

Creates a new instance of the class with the default property values.

Public Methods

std::int64_tget_tree_count()const

The number of trees in the forest.

template<typenameT=Task,typenameNone=detail::enable_if_classification_t<T>>std::int64_tget_class_count()const

The class count. Used with oneapi::dal::decision_forest::task::classification only.

template<typenameVisitor>voidtraverse_depth_first(std::int64_ttree_idx, Visitor&&visitor)const

Performs Depth First Traversal of i-th tree.

Parameters

tree_idx – Index of the tree to traverse.
visitor – This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).

template<typenameT,typenameVisitor>voidtraverse_depth_first(T&&visitor_array)const

Performs Depth First Traversal for all trees.

Parameters: visitor_array – This an array of functors which are notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).

template<typenameVisitor>voidtraverse_breadth_first(std::int64_ttree_idx, Visitor&&visitor)const

Performs Breadth First Traversal of i-th tree.

Parameters

tree_idx – Index of the tree to traverse.
visitor – This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).

template<typenameT,typenameVisitor>voidtraverse_breadth_first(T&&visitor_array)const

Performs Breadth First Traversal for all trees.

Parameters: visitor_array – This an array of functors which are notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).

Training train(...)

Input

template<typenameTask=task::by_default>classtrain_input

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::regression.

Constructors

train_input(consttable&data, consttable&responses, consttable&weights=table{})

Creates a new instance of the class with the given data, responses and weights property values.

Properties

consttable&weights

The vector of weights for the training set . Default value: table{}.

Getter & Setter: const table & get_weights() const
auto & set_weights(const table &value)

consttable&data

The training set . Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &value)

consttable&responses

Vector of responses for the training set . Default value: table{}.

Getter & Setter: const table & get_responses() const
auto & set_responses(const table &value)

consttable&labels

Vector of labels for the training set . Default value: table{}.

Getter & Setter: const table & get_labels() const
auto & set_labels(const table &value)

Result

template<typenameTask=task::by_default>classtrain_result

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::regression.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

consttable&oob_err

A table containing cumulative out-of-bag error value. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error. Default value: table{}.

Getter & Setter: const table & get_oob_err() const
auto & set_oob_err(const table &value)

consttable&oob_err_accuracy

A table containing cumulative out-of-bag error (accuracy) value. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error_accuracy. Default value: table{}.

Getter & Setter: const table & get_oob_err_accuracy() const
auto & set_oob_err_accuracy(const table &value)

consttable&var_importance

A table containing variable importance value for each feature. Computed when variable_importance_mode!=variable_importance_mode::none. Default value: table{}.

Getter & Setter: const table & get_var_importance() const
auto & set_var_importance(const table &value)

consttable&oob_err_per_observation

A table containing out-of-bag error value per observation. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error_per_observation. Default value: table{}.

Getter & Setter: const table & get_oob_err_per_observation() const
auto & set_oob_err_per_observation(const table &value)

consttable&oob_err_r2

A table containing cumulative out-of-bag error (R2) value. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error_r2. Default value: table{}.

Getter & Setter: const table & get_oob_err_r2() const
auto & set_oob_err_r2(const table &value)

consttable&oob_err_prediction

A table containing prediction value per observation. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error_prediction. Default value: table{}.

Getter & Setter: const table & get_oob_err_prediction() const
auto & set_oob_err_prediction(const table &value)

constmodel<Task>&model

The trained Decision Forest model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

consttable&oob_err_decision_function

A table containing decision function value per observation. Computed when error_metric_mode set with error_metric_mode::out_of_bag_error_decision_function. Default value: table{}.

Getter & Setter: const table & get_oob_err_decision_function() const
auto & set_oob_err_decision_function(const table &value)

Operation

template<typenameDescriptor>decision_forest::train_resulttrain(constDescriptor&desc, constdecision_forest::train_input&input)

Parameters

desc – Decision Forest algorithm descriptor decision_forest::descriptor.
input – Input data for the training operation

Preconditions: input.data.is_empty  ==  false
input.labels.is_empty  ==  false
input.labels.column_count  ==  1
input.data.row_count  ==  input.labels.row_count
desc.get_bootstrap()  ==  true  ||  (desc.get_bootstrap()  ==  false  &&  desc.get_variable_importance_mode()  !=  variable_importance_mode::mda_raw  &&  desc.get_variable_importance_mode()  !=  variable_importance_mode::mda_scaled)
desc.get_bootstrap()  ==  true  ||  (desc.get_bootstrap()  ==  false  &&  desc.get_error_metric_mode()  ==  error_metric_mode::none)

Inference infer(...)

Input

template<typenameTask=task::by_default>classinfer_input

Template Parameters: Task – Tag-type that specifies the type of the problem to solve. Can be task::classification or task::regression.

Constructors

infer_input(constmodel<Task>&trained_model, consttable&data)

Creates a new instance of the class with the given model and data property values.

Properties

consttable&data

The dataset for inference . Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &value)

constmodel<Task>&model

The trained Decision Forest model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Result

template<typenameTask=task::by_default>classinfer_result

Template Parameters: Task – Tag-type that specifies the type of the problem to solve. Can be task::classification or task::regression.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

consttable&probabilities

A table with the predicted class probabilities for each observation.

Getter & Setter: template <typename T = Task, typename None = detail::enable_if_classification_t<T>> const table & get_probabilities() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_probabilities(const table &value)

consttable&responses

The table with the predicted responses. Default value: table{}.

Getter & Setter: const table & get_responses() const
auto & set_responses(const table &value)

consttable&labels

The table with the predicted labels. Default value: table{}.

Getter & Setter: const table & get_labels() const
auto & set_labels(const table &value)

Operation

template<typenameDescriptor>decision_forest::infer_resultinfer(constDescriptor&desc, constdecision_forest::infer_input&input)

Parameters

desc – Decision Forest algorithm descriptor decision_forest::descriptor.
input – Input data for the inference operation

Preconditions: input.data.is_empty == false

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Decision Forest Classification and Regression (DF)

Mathematical formulation

Programming Interface