Visible to Intel only — GUID: GUID-CD68CAAE-65C9-4F91-848D-E37C7FD96523
Visible to Intel only — GUID: GUID-CD68CAAE-65C9-4F91-848D-E37C7FD96523
Principal Components Analysis (PCA)
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation
Programming Interface
All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.
Enum classes
enumclassnormalization
- normalization::none
-
No normalization is necessary or data is not normalized.
- normalization::mean_center
-
Just mean centered is necessary, or data is already centered.
- normalization::zscore
-
Normalization is necessary, or data is already normalized.
Descriptor
template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default>classdescriptor
- Template Parameters
-
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
descriptor(std::int64_tcomponent_count=0)
Creates a new instance of the class with the given component_count property value.
Public Methods
boolwhiten()const
auto&set_whiten(boolvalue)
Properties
result_option_idresult_options
Choose which results should be computed and returned.
- Getter & Setter
-
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
normalizationdata_normalization
. Default value: normalization::none.
- Getter & Setter
-
normalization get_data_normalization() const
auto & set_data_normalization(normalization value)
std::int64_tcomponent_count
The number of principal components . If it is zero, the algorithm computes the eigenvectors for all features, . Default value: 0.
- Getter & Setter
-
std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
- Invariants
-
component_count >= 0
normalizationnormalization_mode
. Default value: normalization::zscore.
- Getter & Setter
-
normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)
booldeterministic
Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.
- Getter & Setter
-
bool get_deterministic() const
auto & set_deterministic(bool value)
Method tags
structcov
Tag-type that denotes Covariance computational method.
structprecomputed
structsvd
Tag-type that denotes SVD computational method.
usingby_default=cov
Alias tag-type for Covariance computational method.
Task tags
structdim_reduction
Tag-type that parameterizes entities used for solving dimensionality reduction problem.
usingby_default=dim_reduction
Alias tag-type for dimensionality reduction task.
Model
template<typenameTask=task::by_default>classmodel
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
model()
Creates a new instance of the class with the default property values.
Properties
consttable&means
Means. Default value: table{}.
- Getter & Setter
-
const table & get_means() const
auto & set_means(const table &value)
consttable&eigenvectors
An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
consttable&eigenvalues
Eigenvalues. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
consttable&variances
Variances. Default value: table{}.
- Getter & Setter
-
const table & get_variances() const
auto & set_variances(const table &value)
Training train(...)
Input
template<typenameTask=task::by_default>classtrain_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
train_input()
train_input(consttable&data)
Creates a new instance of the class with the given data property value.
Properties
consttable&data
An table with the training data, where each row stores one feature vector. Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &data)
Result and Finalize Result
template<typenameTask=task::by_default>classtrain_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
train_result()
Creates a new instance of the class with the default property values.
Properties
consttable&singular_values
A table that contains the singular values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_singular_values() const
auto & set_singular_values(const table &value)
constresult_option_id&result_options
Result options that indicates availability of the properties. Default value: default_result_options<Task>.
- Getter & Setter
-
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
consttable&means
A table that contains the mean values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_means() const
auto & set_means(const table &value)
consttable&explained_variances_ratio
A table that contains the explained variances values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_explained_variances_ratio() const
auto & set_explained_variances_ratio(const table &value)
consttable&eigenvectors
An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
- Invariants
-
eigenvectors == model.eigenvectors
consttable&eigenvalues
A table that contains the eigenvalues for for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
consttable&variances
A table that contains the variances for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_variances() const
auto & set_variances(const table &value)
constmodel<Task>&model
The trained PCA model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
Operation
template<typenameDescriptor>pca::train_resulttrain(constDescriptor&desc, constpca::train_input&input)
- Parameters
-
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the training operation
- Preconditions
-
input.data.has_data == true
input.data.column_count >= desc.component_count
- Postconditions
-
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count
Partial Training
Partial Input
template<typenameTask=task::by_default>classpartial_train_input
Constructors
partial_train_input()
partial_train_input(consttable&data)
partial_train_input(constpartial_train_result<Task>&prev, consttable&data)
Properties
consttable&data
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &value)
constpartial_train_result<Task>&prev
- Getter & Setter
-
const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)
Partial Result and Finalize Input
template<typenameTask=task::by_default>classpartial_train_result
Constructors
partial_train_result()
Public Methods
std::int64_tget_auxiliary_table_count()const
Properties
consttable&partial_n_rows
The nobs value. Default value: table{}.
- Getter & Setter
-
const table & get_partial_n_rows() const
auto & set_partial_n_rows(const table &value)
consttable&auxiliary_table
- Getter & Setter
-
const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)
consttable&partial_crossproduct
The crossproduct matrix. Default value: table{}.
- Getter & Setter
-
const table & get_partial_crossproduct() const
auto & set_partial_crossproduct(const table &value)
consttable&partial_sum
Sums. Default value: table{}.
- Getter & Setter
-
const table & get_partial_sum() const
auto & set_partial_sum(const table &value)
Finalize Training
Inference infer(...)
Input
template<typenameTask=task::by_default>classinfer_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
infer_input(constmodel<Task>&trained_model, consttable&data)
Creates a new instance of the class with the given model and data property values.
Properties
consttable&data
The dataset for inference . Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &value)
constmodel<Task>&model
The trained PCA model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
Result
template<typenameTask=task::by_default>classinfer_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
infer_result()
Creates a new instance of the class with the default property values.
Properties
consttable&transformed_data
An table that contains data projected to the r principal components. Default value: table{}.
- Getter & Setter
-
const table & get_transformed_data() const
auto & set_transformed_data(const table &value)
Operation
template<typenameDescriptor>pca::infer_resultinfer(constDescriptor&desc, constpca::infer_input&input)
- Parameters
-
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the inference operation
Usage Example
Training
pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());
return result.get_model();
}
Inference
table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());
const auto result = infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}
Examples
oneAPI DPC++
Batch Processing:
Online Processing:
oneAPI C++
Batch Processing:
Online Processing: