vslSSEditMissingValues

Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

Download PDF

ID 766686

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-AD6B4B8F-FD29-4E48-9BD3-90F94CBE8CEE

View Details

vslSSEditMissingValues

Modifies pointers to arrays associated with the method of supporting missing values in a dataset.

Syntax

status = vslssseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)

status = vsldsseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)

Include Files

mkl_vsl.f90

Input Parameters

Name	Type	Description
task	Fortran: TYPE(VSL_SS_TASK)	Descriptor of the task
nparams	Fortran: INTEGER	Pointer to the number of method parameters
params	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues	Pointer to the array of method parameters
init_estimates_n	Fortran: INTEGER	Pointer to the number of initial estimates for mean and a variance-covariance matrix
init_estimates	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues	Pointer to the array that holds initial estimates for mean and a variance-covariance matrix
prior_n	Fortran: INTEGER	Pointer to the number of prior parameters
prior	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues	Pointer to the array of prior parameters
simul_missing_vals_n	Fortran: INTEGER	Pointer to the size of the array that holds output of the Multiple Imputation method
simul_missing_vals	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues	Pointer to the array of size `k*m`, where `k` is the total number of missing values, and `m` is number of copies of missing values. The array holds `m` sets of simulated missing values for the matrix of observations.
estimates_n	Fortran: INTEGER	Pointer to the number of estimates to be returned by the routine
estimates	Fortran: REAL(KIND=4) DIMENSION() for vslssseditmissingvalues REAL(KIND=8) DIMENSION() for vsldsseditmissingvalues	Pointer to the array that holds estimates of the mean and a variance-covariance matrix.

Output Parameters

Name	Type	Description
status	Fortran: INTEGER	Current status of the task

Description

The vslSSEditMissingValues routine uses values passed as parameters of the routine to replace pointers to the number and the array of the method parameters, pointers to the number and the array of initial mean/variance-covariance estimates, the pointer to the number and the array of prior parameters, pointers to the number and the array of simulated missing values, and pointers to the number and the array of the intermediate mean/covariance estimates. If you pass a value of NULL for a specific input parameter, the value of that parameter in the task descriptor is unchanged.

Before you call the Summary Statistics routines to process missing values, preprocess the dataset and denote missing observations with one of the following predefined constants:

VSL_SS_SNAN, if the dataset is stored in single precision floating-point arithmetic
VSL_SS_DNAN, if the dataset is stored in double precision floating-point arithmetic

Intel® oneAPI Math Kernel Library (oneMKL) provides theVSL_SS_METHOD_MI method to support missing values in the dataset based on the Multiple Imputation (MI) approach described in [Schafer97]. The following components support Multiple Imputation:

Expectation Maximization (EM) algorithm to compute the start point for the Data Augmentation (DA) procedure
DA function

NOTE:

The DA component of the MI procedure is simulation-based and uses the VSL_BRNG_MCG59 basic random number generator with predefined seed = 2⁵⁰ and the Gaussian distribution generator (ICDFmethod) available in Intel® oneAPI Math Kernel Library (oneMKL) [Gaussian].

Pack the parameters of the MI algorithm into the params array. Table "Structure of the Array of MI Parameters" describes the params structure.

Structure of the Array of MI Parameters
Array Position	Algorithm Parameter	Description
0	em_iter_num	Maximal number of iterations for the EM algorithm. By default, this value is 50.
1	da_iter_num	Maximal number of iterations for the DA algorithm. By default, this value is 30.
2	ε	Stopping criterion for the EM algorithm. The algorithm terminates if the maximal module of the element-wise difference between the previous and current parameter values is less than ε. By default, this value is 0.001.
3	m	Number of sets to impute
4	missing_vals_num	Total number of missing values in the datasets

You can also pass initial estimates into the EM algorithm by packing both the vector of means and the variance-covariance matrix as a one-dimensional array init_estimates. The size of the array should be at least p + p(p + 1)/2. For i=0, .., p-1, the init_estimates[i] array contains the initial estimate of means. The remaining positions of the array are occupied by the upper triangular part of the variance-covariance matrix.

If you provide no initial estimates for the EM algorithm, the editor uses the default values, that is, the vector of zero means and the unitary matrix as a variance-covariance matrix. You can also pass prior parameters for μ and Σ into the library: μ₀, τ, m, and Λ^-1. Pack these parameters as a one-dimensional array prior with a size of at least

(p² + 3p + 4)/2.

The storage format is as follows:

prior[0], ..., prior[p-1] contain the elements of the vector μ₀.
prior[p] contains the parameter τ.
prior[p+1] contains the parameter m.
The remaining positions are occupied by the upper-triangular part of the inverted matrix Λ^-1.

If you provide no prior parameters, the editor uses their default values:

The array of p zeros is used as μ₀.
τ is set to 0.
m is set to p.
The zero matrix is used as an initial approximate of Λ^-1.

The EditMissingValues editor returns m sets of imputed values and/or a sequence of parameter estimates drawn during the DA procedure.

The editor returns the imputed values as the simul_missing_vals array. The size of the array should be sufficient to hold m sets each of the missing_vals_num size, that is, at least m*missing_vals_num in total. The editor packs the imputed values one by one in the order of their appearance in the matrix of observations.

For example, consider a task of dimension 4. The total number of observations n is 10. The second observation vector misses variables 1 and 2, and the seventh observation vector lacks variable 1. The number of sets to impute is m=2. Then, simul_missing_vals[0] and simul_missing_vals[1] contains the first and the second points for the second observation vector, and simul_missing_vals[2] holds the first point for the seventh observation. Positions 3, 4, and 5 are formed similarly.

To estimate convergence of the DA algorithm and choose a proper value of the number of DA iterations, request the sequence of parameter estimates that are produced during the DA procedure. The editor returns the sequence of parameters as a single array. The size of the array is

m*da_iter_num*(p+(p²+p)/2)

where

m is the number of sets of values to impute.
da_iter_num is the number of DA iterations.
The value p+(p²+p)/2 determines the size of the memory to hold one set of the parameter estimates.

In each set of the parameters, the vector of means occupies the first p positions and the remaining (p²+p)/2 positions are intended for the upper triangular part of the variance-covariance matrix.

Upon successful generation of m sets of imputed values, you can place them in cells of the data matrix with missing values and use the Summary Statistics routines to analyze and get estimates for each of the m complete datasets.

NOTE:

Intel® oneAPI Math Kernel Library (oneMKL) implementation of the MI algorithm rewrites cells of the dataset that contain theVSL_SS_SNAN/VSL_SS_DNAN values. If you want to use the Summary Statistics routines to process the data with missing values again, mask the positions of the empty cells.

See additional details of the algorithm usage model in the Intel® oneAPI Math Kernel Library (oneMKL) Summary Statistics Application Notes document [SS Notes].

Parent topic: Summary Statistics Task Editors

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

vslSSEditMissingValues

Syntax

Include Files

Input Parameters

Output Parameters

Description