Application Notes for Intel® oneAPI Math Kernel Library Summary Statistics

ID 772991
Date 12/04/2020
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Estimating Raw and Central Moments and Sums, Skewness, Excess Kurtosis, Variation, and Variance-Covariance/Correlation/Cross-Product Matrix

Summary Statistics offers the following methods to support computation of raw and central moments and sums, skewness, excess kurtosis (further referred to as kurtosis), variation, and variance-covariance/correlation/cross-product matrix:

  1. Method VSL_SS_METHOD_FAST is a performance-oriented implementation of an algorithm for estimate calculations.

  2. Method VSL_SS_METHOD_FAST_USER_MEAN is an implementation of an algorithm for estimate calculations when a user-defined mean is provided.

  3. Method VSL_SS_METHOD_1PASS is an implementation of a one-pass algorithm. In this case, all requested estimates are computed for a single pass. For example, see [West79].

  4. Method VSL_SS_METHOD_CP_TO_COVCOR is an implementation of computation of a variance-covariance and/or correlation matrix from a corresponding cross-product matrix.

  5. Method VSL_SS_METHOD_SUM_TO_MOM is an implementation of computation of raw/central statistical moments as well as kurtosis/skewness/variation from corresponding raw/central sums.

The VSL_SS_METHOD_FAST method for variance-covariance estimation can be numerically unstable for some datasets, such as a dataset from Gaussian distribution with a standard deviation several orders smaller than its mean.  For such datasets, to estimate variance-covariance, cross-product or another estimate relying on mean, use the one-pass algorithm supported by the library, or the two-pass algorithm [West79], whose building blocks are available in the library. In the latter case, you need to do the following:

  1. Compute the mean using Summary Statistics functions.

  2. Compute the variance-covariance, cross-product or another estimate by providing the computed mean and applying the VSL_SS_METHOD_FAST_USER_MEAN method.

Each estimate is stored as a one-dimensional array. The size of the array may differ depending on the type of the estimate, as follows:

Estimate Type Size of the Array
  1. Raw and central moments

  2. Raw and central sums

  3. Kurtosis

  4. Skewness

  5. Variation

Must be sufficient to store at least p elements, where p is the dimension of the task.
  1. Variance-covariance matrix

  2. Correlation matrix

  3. Cross-product matrix

Depends on the storage format. For details, see Table Storage formats of a variance-covariance/correlation/cross-product matrix in the Summary Statistics section of [MKLMan].