Visible to Intel only — GUID: GUID-F9F1AAE8-DE13-4B6E-85BA-7E0D5EE71853
Visible to Intel only — GUID: GUID-F9F1AAE8-DE13-4B6E-85BA-7E0D5EE71853
k-Nearest Neighbors Classification (k-NN)
k-NN classification, regression, and search algorithms are based on finding the k nearest observations to the training set. For classification, the problem is to infer the class of a new feature vector by computing the majority vote of its k nearest observations from the training set. For regression, the problem is to infer the target value of a new feature vector by computing the average target value of its k nearest observations from the training set. For search, the problem is to identify the k nearest observations from the training set to a new feature vector. The nearest observations are computed based on the chosen distance metric.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation
Refer to Developer Guide: k-Nearest Neighbors Classification.
Programming Interface
All types and functions in this section are declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.
Enum classes
enumclassvoting_mode
- voting_mode::uniform
-
Uniform weights for neighbors for prediction voting.
- voting_mode::distance
-
Weight neighbors by the inverse of their distance.
Result options
classresult_option_id
Public Methods
constexprresult_option_id()=default
constexprresult_option_id(constresult_option_id_base&base)
Descriptor
template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default,typenameDistance=oneapi::dal::minkowski_distance::descriptor<Float>>classdescriptor
- Template Parameters
-
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::brute_force or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::regression, or task::search.
Distance – The descriptor of the distance used for computations. Can be minkowski_distance::descriptor or chebyshev_distance::descriptor.
Constructors
descriptor(std::int64_tclass_count, std::int64_tneighbor_count)
Creates a new instance of the class with the given class_count and neighbor_count property values.
template<typenameM=Method,typenameNone=detail::enable_if_brute_force_t<M>>descriptor(std::int64_tclass_count, std::int64_tneighbor_count, constdistance_t&distance)
Creates a new instance of the class with the given class_count, neighbor_count and distance property values. Used with method::brute_force only.
template<typenameT=Task,typenameNone=detail::enable_if_not_classification_t<T>>descriptor(std::int64_tneighbor_count)
Creates a new instance of the class with the given neighbor_count property value. Used with task::search and task::regression only.
template<typenameT=Task,typenameNone=detail::enable_if_not_classification_t<T>>descriptor(std::int64_tneighbor_count, constdistance_t&distance)
Creates a new instance of the class with the given neighbor_count and distance property values. Used with task::search and task::regression only.
Properties
constdistance_t&distance
Choose distance type for calculations. Used with method::brute_force only.
- Getter & Setter
-
template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> const distance_t & get_distance() const
template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> auto & set_distance(const distance_t &dist)
result_option_idresult_options
Choose which results should be computed and returned.
- Getter & Setter
-
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
std::int64_tclass_count
The number of classes c.
- Getter & Setter
-
std::int64_t get_class_count() const
auto & set_class_count(std::int64_t value)
- Invariants
-
class_count > 1
voting_modevoting_mode
The voting mode.
- Getter & Setter
-
voting_mode get_voting_mode() const
auto & set_voting_mode(voting_mode value)
std::int64_tneighbor_count
The number of neighbors k.
- Getter & Setter
-
std::int64_t get_neighbor_count() const
auto & set_neighbor_count(std::int64_t value)
- Invariants
-
neighbor_count > 0
Method tags
structbrute_force
Tag-type that denotes brute-force computational method.
structkd_tree
Tag-type that denotes k-d tree computational method.
usingby_default=brute_force
Alias tag-type for brute-force computational method.
Task tags
structclassification
Tag-type that parameterizes entities used for solving classification problem.
structregression
Tag-type that parameterizes entities used for solving the regression problem.
structsearch
Tag-type that parameterizes entities used for solving the search problem.
usingby_default=classification
Alias tag-type for classification task.
Model
template<typenameTask=task::by_default>classmodel
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::search and task::regression.
Constructors
model()
Creates a new instance of the class with the default property values.
Training train(...)
Input
template<typenameTask=task::by_default>classtrain_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.
Constructors
train_input(consttable&data, consttable&responses)
Creates a new instance of the class with the given data and responses property values.
train_input(consttable&data)
Properties
consttable&labels
Vector of labels y for the training set X. Default value: table{}.
- Getter & Setter
-
const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)
consttable&responses
Vector of responses y for the training set X. Default value: table{}.
- Getter & Setter
-
const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_responses(const table &responses)
consttable&data
The training set X. Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &data)
Result
template<typenameTask=task::by_default>classtrain_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.
Constructors
train_result()
Creates a new instance of the class with the default property values.
Properties
constmodel<Task>&model
The trained k-NN model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
Operation
template<typenameDescriptor>knn::train_resulttrain(constDescriptor&desc, constknn::train_input&input)
- Parameters
-
desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the training operation
- Preconditions
-
input.data.has_data == true
input.labels.has_data == true
input.data.row_count == input.labels.row_count
input.labels.column_count == 1
input.labels[i] >= 0
input.labels[i] < desc.class_count
Inference infer(...)
Input
template<typenameTask=task::by_default>classinfer_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.
Constructors
infer_input(consttable&data, constmodel<Task>&model)
Creates a new instance of the class with the given model and data property values.
Properties
constmodel<Task>&model
The trained k-NN model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &m)
consttable&data
The dataset for inference . Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &data)
Result
template<typenameTask=task::by_default>classinfer_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.
Constructors
infer_result()
Creates a new instance of the class with the default property values.
Properties
consttable&labels
The predicted labels. Default value: table{}.
- Getter & Setter
-
const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)
consttable&distances
Distances to nearest neighbors. Default value: table{}.
- Getter & Setter
-
const table & get_distances() const
auto & set_distances(const table &value)
consttable&indices
Indices of nearest neighbors. Default value: table{}.
- Getter & Setter
-
const table & get_indices() const
auto & set_indices(const table &value)
constresult_option_id&result_options
Result options that indicates availability of the properties.
- Getter & Setter
-
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
consttable&responses
The predicted responses. Default value: table{}.
- Getter & Setter
-
const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_not_search_t<T>> auto & set_responses(const table &value)
Operation
template<typenameDescriptor>knn::infer_resultinfer(constDescriptor&desc, constknn::infer_input&input)
- Parameters
-
desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the inference operation
Usage Example
Training
knn::model<> run_training(const table& data, const table& labels) { const std::int64_t class_count = 10; const std::int64_t neighbor_count = 5; const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count}; const auto result = train(knn_desc, data, labels); return result.get_model(); }
Inference
table run_inference(const knn::model<>& model, const table& new_data) { const std::int64_t class_count = 10; const std::int64_t neighbor_count = 5; const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count}; const auto result = infer(knn_desc, model, new_data); print_table("labels", result.get_labels()); }
Examples
oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Distributed Processing: