Principal Components Analysis (PCA)

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-CD68CAAE-65C9-4F91-848D-E37C7FD96523

View Details

Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Operation	Computational methods		Programming Interface
Training	Covariance	SVD	train(…)	train_input	train_result
Inference	Covariance	SVD	infer(…)	infer_input	infer_result
Partial Training	Covariance	SVD	partial_train(…)	partial_train_input	partial_train_result
Finalize Training	Covariance	SVD	finalize_train(…)	partial_train_result	train_result

Mathematical formulation

Refer to Developer Guide: Principal Components Analysis.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.

Enum classes

enumclassnormalization

normalization::none: No normalization is necessary or data is not normalized.
normalization::mean_center: Just mean centered is necessary, or data is already centered.
normalization::zscore: Normalization is necessary, or data is already normalized.

Descriptor

template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default>classdescriptor

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

descriptor(std::int64_tcomponent_count=0)

Creates a new instance of the class with the given component_count property value.

Public Methods

boolwhiten()const

auto&set_whiten(boolvalue)

Properties

result_option_idresult_options

Choose which results should be computed and returned.

Getter & Setter: result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)

normalizationdata_normalization

. Default value: normalization::none.

Getter & Setter: normalization get_data_normalization() const
auto & set_data_normalization(normalization value)

std::int64_tcomponent_count

The number of principal components . If it is zero, the algorithm computes the eigenvectors for all features, . Default value: 0.

Getter & Setter: std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
Invariants: component_count >= 0

normalizationnormalization_mode

. Default value: normalization::zscore.

Getter & Setter: normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)

booldeterministic

Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.

Getter & Setter: bool get_deterministic() const
auto & set_deterministic(bool value)

Method tags

structcov

Tag-type that denotes Covariance computational method.

structprecomputed

structsvd

Tag-type that denotes SVD computational method.

usingby_default=cov

Alias tag-type for Covariance computational method.

Task tags

structdim_reduction

Tag-type that parameterizes entities used for solving dimensionality reduction problem.

usingby_default=dim_reduction

Alias tag-type for dimensionality reduction task.

Model

template<typenameTask=task::by_default>classmodel

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

model()

Creates a new instance of the class with the default property values.

Properties

consttable&means

Means. Default value: table{}.

Getter & Setter: const table & get_means() const
auto & set_means(const table &value)

consttable&eigenvectors

An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter: const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)

consttable&eigenvalues

Eigenvalues. Default value: table{}.

Getter & Setter: const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)

consttable&variances

Variances. Default value: table{}.

Getter & Setter: const table & get_variances() const
auto & set_variances(const table &value)

Training train(...)

Input

template<typenameTask=task::by_default>classtrain_input

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_input()

train_input(consttable&data)

Creates a new instance of the class with the given data property value.

Properties

consttable&data

An table with the training data, where each row stores one feature vector. Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &data)

Result and Finalize Result

template<typenameTask=task::by_default>classtrain_result

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

consttable&singular_values

A table that contains the singular values for the first r features. Default value: table{}.

Getter & Setter: const table & get_singular_values() const
auto & set_singular_values(const table &value)

constresult_option_id&result_options

Result options that indicates availability of the properties. Default value: default_result_options<Task>.

Getter & Setter: const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)

consttable&means

A table that contains the mean values for the first r features. Default value: table{}.

Getter & Setter: const table & get_means() const
auto & set_means(const table &value)

consttable&explained_variances_ratio

A table that contains the explained variances values for the first r features. Default value: table{}.

Getter & Setter: const table & get_explained_variances_ratio() const
auto & set_explained_variances_ratio(const table &value)

consttable&eigenvectors

An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter: const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
Invariants: eigenvectors == model.eigenvectors

consttable&eigenvalues

A table that contains the eigenvalues for for the first r features. Default value: table{}.

Getter & Setter: const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)

consttable&variances

A table that contains the variances for the first r features. Default value: table{}.

Getter & Setter: const table & get_variances() const
auto & set_variances(const table &value)

constmodel<Task>&model

The trained PCA model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Operation

template<typenameDescriptor>pca::train_resulttrain(constDescriptor&desc, constpca::train_input&input)

Parameters

desc – PCA algorithm descriptor pca::descriptor
input – Input data for the training operation

Preconditions: input.data.has_data  ==  true
input.data.column_count  >=  desc.component_count
Postconditions: result.means.row_count  ==  1
result.means.column_count  ==  desc.component_count
result.variances.row_count  ==  1
result.variances.column_count  ==  desc.component_count
result.variances[i]  >=  0.0
result.eigenvalues.row_count  ==  1
result.eigenvalues.column_count  ==  desc.component_count
result.model.eigenvectors.row_count  ==  1
result.model.eigenvectors.column_count  ==  desc.component_count

Partial Training

Partial Input

template<typenameTask=task::by_default>classpartial_train_input

Constructors

partial_train_input()

partial_train_input(consttable&data)

partial_train_input(constpartial_train_result<Task>&prev, consttable&data)

Properties

consttable&data

Getter & Setter: const table & get_data() const
auto & set_data(const table &value)

constpartial_train_result<Task>&prev

Getter & Setter: const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)

Partial Result and Finalize Input

template<typenameTask=task::by_default>classpartial_train_result

Constructors

partial_train_result()

Public Methods

std::int64_tget_auxiliary_table_count()const

Properties

consttable&partial_n_rows

The nobs value. Default value: table{}.

Getter & Setter: const table & get_partial_n_rows() const
auto & set_partial_n_rows(const table &value)

consttable&auxiliary_table

Getter & Setter: const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)

consttable&partial_crossproduct

The crossproduct matrix. Default value: table{}.

Getter & Setter: const table & get_partial_crossproduct() const
auto & set_partial_crossproduct(const table &value)

consttable&partial_sum

Sums. Default value: table{}.

Getter & Setter: const table & get_partial_sum() const
auto & set_partial_sum(const table &value)

Finalize Training

Inference infer(...)

Input

template<typenameTask=task::by_default>classinfer_input

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_input(constmodel<Task>&trained_model, consttable&data)

Creates a new instance of the class with the given model and data property values.

Properties

consttable&data

The dataset for inference . Default value: table{}.

Getter & Setter: const table & get_data() const
auto & set_data(const table &value)

constmodel<Task>&model

The trained PCA model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Result

template<typenameTask=task::by_default>classinfer_result

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

consttable&transformed_data

An table that contains data projected to the r principal components. Default value: table{}.

Getter & Setter: const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Operation

template<typenameDescriptor>pca::infer_resultinfer(constDescriptor&desc, constpca::infer_input&input)

Parameters

desc – PCA algorithm descriptor pca::descriptor
input – Input data for the inference operation

Preconditions: input.data.has_data  ==  true
input.model.eigenvectors.row_count  ==  desc.component_count
input.model.eigenvectors.column_count  ==  input.data.column_count
Postconditions: result.transformed_data.row_count  ==  input.data.row_count
result.transformed_data.column_count  ==  desc.component_count

Usage Example

Training

pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);

   const auto result = train(pca_desc, data);

   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());

   return result.get_model();
}

Inference

table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());

   const auto result = infer(pca_desc, model, new_data);

   print_table("labels", result.get_transformed_data());
}