k-Nearest Neighbors Classification (k-NN)

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-7E690C4E-545C-414A-A00A-395193DB48B5

View Details

k-Nearest Neighbors Classification (k-NN)

k-NN classification, regression, and search algorithms are based on finding the k nearest observations to the training set. For classification, the problem is to infer the class of a new feature vector by computing the majority vote of its k nearest observations from the training set. For regression, the problem is to infer the target value of a new feature vector by computing the average target value of its k nearest observations from the training set. For search, the problem is to identify the k nearest observations from the training set to a new feature vector. The nearest observations are computed based on the chosen distance metric.

Operation	Computational methods		Programming Interface
Training	Brute-force	k-d tree	train(…)	train_input	train_result
Inference	Brute-force	k-d tree	infer(…)	infer_input	infer_result

Mathematical formulation

Refer to Developer Guide: k-Nearest Neighbors Classification.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Enum classes

enumclassvoting_mode

voting_mode::uniform: Uniform weights for neighbors for prediction voting.
voting_mode::distance: Weight neighbors by the inverse of their distance.

Result options

classresult_option_id

Public Methods

constexprresult_option_id()=default

constexprresult_option_id(constresult_option_id_base&base)

Descriptor

template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default,typenameDistance=oneapi::dal::minkowski_distance::descriptor<Float>>classdescriptor

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::brute_force or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::regression, or task::search.
Distance – The descriptor of the distance used for computations. Can be minkowski_distance::descriptor or chebyshev_distance::descriptor.

Constructors

descriptor(std::int64_tclass_count, std::int64_tneighbor_count)

Creates a new instance of the class with the given class_count and neighbor_count property values.

template<typenameM=Method,typenameNone=detail::enable_if_brute_force_t<M>>descriptor(std::int64_tclass_count, std::int64_tneighbor_count, constdistance_t&distance)

Creates a new instance of the class with the given class_count, neighbor_count and distance property values. Used with method::brute_force only.

template<typenameT=Task,typenameNone=detail::enable_if_not_classification_t<T>>descriptor(std::int64_tneighbor_count)

Creates a new instance of the class with the given neighbor_count property value. Used with task::search and task::regression only.

template<typenameT=Task,typenameNone=detail::enable_if_not_classification_t<T>>descriptor(std::int64_tneighbor_count, constdistance_t&distance)

Creates a new instance of the class with the given neighbor_count and distance property values. Used with task::search and task::regression only.

Properties

std::int64_tclass_count

The number of classes c.

Getter & Setter: std::int64_t get_class_count() const
auto & set_class_count(std::int64_t value)
Invariants: class_count > 1

result_option_idresult_options

Choose which results should be computed and returned.

Getter & Setter: result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)

constdistance_t&distance

Choose distance type for calculations. Used with method::brute_force only.

Getter & Setter: template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> const distance_t & get_distance() const
template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> auto & set_distance(const distance_t &dist)

std::int64_tneighbor_count

The number of neighbors k.

Getter & Setter: std::int64_t get_neighbor_count() const
auto & set_neighbor_count(std::int64_t value)
Invariants: neighbor_count > 0

voting_modevoting_mode

The voting mode.

Getter & Setter: voting_mode get_voting_mode() const
auto & set_voting_mode(voting_mode value)

Method tags

structkd_tree

Tag-type that denotes k-d tree computational method.

structbrute_force

Tag-type that denotes brute-force computational method.

usingby_default=brute_force

Alias tag-type for brute-force computational method.

Task tags

structclassification

Tag-type that parameterizes entities used for solving classification problem.

structregression

Tag-type that parameterizes entities used for solving the regression problem.

structsearch

Tag-type that parameterizes entities used for solving the search problem.

usingby_default=classification

Alias tag-type for classification task.

Model

template<typenameTask=task::by_default>classmodel

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::search and task::regression.

Constructors

model()

Creates a new instance of the class with the default property values.

Training train(...)

Input

template<typenameTask=task::by_default>classtrain_input

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_input(consttable&data, consttable&responses)

Creates a new instance of the class with the given data and responses property values.

train_input(consttable&data)

Properties

consttable&data

The training set X. Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &data)

consttable&responses

Vector of responses y for the training set X. Default value: table{}.

Getter & Setter: const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_responses(const table &responses)

consttable&labels

Vector of labels y for the training set X. Default value: table{}.

Getter & Setter: const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)

Result

template<typenameTask=task::by_default>classtrain_result

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

constmodel<Task>&model

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Operation

template<typenameDescriptor>knn::train_resulttrain(constDescriptor&desc, constknn::train_input&input)

Parameters

desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the training operation

Preconditions: input.data.has_data  ==  true
input.labels.has_data  ==  true
input.data.row_count  ==  input.labels.row_count
input.labels.column_count  ==  1
input.labels[i]  >=  0
input.labels[i]  <  desc.class_count

Inference infer(...)

Input

template<typenameTask=task::by_default>classinfer_input

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_input(consttable&data, constmodel<Task>&model)

Creates a new instance of the class with the given model and data property values.

Properties

consttable&data

The dataset for inference . Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &data)

constmodel<Task>&model

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &m)

Result

template<typenameTask=task::by_default>classinfer_result

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

constresult_option_id&result_options

Result options that indicates availability of the properties.

Getter & Setter: const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)

consttable&responses

The predicted responses. Default value: table{}.

Getter & Setter: const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_not_search_t<T>> auto & set_responses(const table &value)

consttable&indices

Indices of nearest neighbors. Default value: table{}.

Getter & Setter: const table & get_indices() const
auto & set_indices(const table &value)

consttable&distances

Distances to nearest neighbors. Default value: table{}.

Getter & Setter: const table & get_distances() const
auto & set_distances(const table &value)

consttable&labels

The predicted labels. Default value: table{}.

Getter & Setter: const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)

Operation

template<typenameDescriptor>knn::infer_resultinfer(constDescriptor&desc, constknn::infer_input&input)

Parameters

desc – k-NN algorithm descriptor knn::descriptor
input – Input data for the inference operation

Preconditions: input.data.has_data == true
Postconditions: result.labels.row_count  ==  input.data.row_count
result.labels.column_count  ==  1
result.labels[i]  >=  0
result.labels[i]  <  desc.class_count

Usage Example

Training

knn::model<> run_training(const table& data,
                        const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference

table run_inference(const knn::model<>& model,
                  const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Examples

oneAPI DPC++

Batch Processing:

knn_cls_brute_force_dense_batch.cpp

oneAPI C++

Batch Processing:

Python* with DPC++ support

Distributed Processing:

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

k-Nearest Neighbors Classification (k-NN)

Mathematical formulation

Programming Interface

Usage Example

Examples