Visible to Intel only — GUID: GUID-AD6B4B8F-FD29-4E48-9BD3-90F94CBE8CEE
Visible to Intel only — GUID: GUID-AD6B4B8F-FD29-4E48-9BD3-90F94CBE8CEE
vslSSEditMissingValues
Modifies pointers to arrays associated with the method of supporting missing values in a dataset.
status = vslssseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)
status = vsldsseditmissingvalues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates)
- mkl_vsl.f90
Name |
Type |
Description |
---|---|---|
task |
Fortran: TYPE(VSL_SS_TASK) |
Descriptor of the task |
nparams |
Fortran: INTEGER |
Pointer to the number of method parameters |
params |
Fortran: REAL(KIND=4) DIMENSION(*) for vslssseditmissingvalues REAL(KIND=8) DIMENSION(*) for vsldsseditmissingvalues |
Pointer to the array of method parameters |
init_estimates_n |
Fortran: INTEGER |
Pointer to the number of initial estimates for mean and a variance-covariance matrix |
init_estimates |
Fortran: REAL(KIND=4) DIMENSION(*) for vslssseditmissingvalues REAL(KIND=8) DIMENSION(*) for vsldsseditmissingvalues |
Pointer to the array that holds initial estimates for mean and a variance-covariance matrix |
prior_n |
Fortran: INTEGER |
Pointer to the number of prior parameters |
prior |
Fortran: REAL(KIND=4) DIMENSION(*) for vslssseditmissingvalues REAL(KIND=8) DIMENSION(*) for vsldsseditmissingvalues |
Pointer to the array of prior parameters |
simul_missing_vals_n |
Fortran: INTEGER |
Pointer to the size of the array that holds output of the Multiple Imputation method |
simul_missing_vals |
Fortran: REAL(KIND=4) DIMENSION(*) for vslssseditmissingvalues REAL(KIND=8) DIMENSION(*) for vsldsseditmissingvalues |
Pointer to the array of size k*m, where k is the total number of missing values, and m is number of copies of missing values. The array holds m sets of simulated missing values for the matrix of observations. |
estimates_n |
Fortran: INTEGER |
Pointer to the number of estimates to be returned by the routine |
estimates |
Fortran: REAL(KIND=4) DIMENSION(*) for vslssseditmissingvalues REAL(KIND=8) DIMENSION(*) for vsldsseditmissingvalues |
Pointer to the array that holds estimates of the mean and a variance-covariance matrix. |
Name |
Type |
Description |
---|---|---|
status |
Fortran: INTEGER |
Current status of the task |
The vslSSEditMissingValues routine uses values passed as parameters of the routine to replace pointers to the number and the array of the method parameters, pointers to the number and the array of initial mean/variance-covariance estimates, the pointer to the number and the array of prior parameters, pointers to the number and the array of simulated missing values, and pointers to the number and the array of the intermediate mean/covariance estimates. If you pass a value of NULL for a specific input parameter, the value of that parameter in the task descriptor is unchanged.
Before you call the Summary Statistics routines to process missing values, preprocess the dataset and denote missing observations with one of the following predefined constants:
VSL_SS_SNAN, if the dataset is stored in single precision floating-point arithmetic
VSL_SS_DNAN, if the dataset is stored in double precision floating-point arithmetic
Intel® oneAPI Math Kernel Library provides theVSL_SS_METHOD_MI method to support missing values in the dataset based on the Multiple Imputation (MI) approach described in [Schafer97]. The following components support Multiple Imputation:
Expectation Maximization (EM) algorithm to compute the start point for the Data Augmentation (DA) procedure
DA function
The DA component of the MI procedure is simulation-based and uses the VSL_BRNG_MCG59 basic random number generator with predefined seed = 250 and the Gaussian distribution generator (ICDFmethod) available in Intel® oneAPI Math Kernel Library [Gaussian].
Pack the parameters of the MI algorithm into the params array. Table "Structure of the Array of MI Parameters" describes the params structure.
Array Position |
Algorithm Parameter |
Description |
---|---|---|
0 |
em_iter_num |
Maximal number of iterations for the EM algorithm. By default, this value is 50. |
1 |
da_iter_num |
Maximal number of iterations for the DA algorithm. By default, this value is 30. |
2 |
ε |
Stopping criterion for the EM algorithm. The algorithm terminates if the maximal module of the element-wise difference between the previous and current parameter values is less than ε. By default, this value is 0.001. |
3 |
m |
Number of sets to impute |
4 |
missing_vals_num |
Total number of missing values in the datasets |
You can also pass initial estimates into the EM algorithm by packing both the vector of means and the variance-covariance matrix as a one-dimensional array init_estimates. The size of the array should be at least p + p(p + 1)/2. For i=0, .., p-1, the init_estimates[i] array contains the initial estimate of means. The remaining positions of the array are occupied by the upper triangular part of the variance-covariance matrix.
If you provide no initial estimates for the EM algorithm, the editor uses the default values, that is, the vector of zero means and the unitary matrix as a variance-covariance matrix. You can also pass prior parameters for μ and Σ into the library: μ0, τ, m, and Λ-1. Pack these parameters as a one-dimensional array prior with a size of at least
(p2 + 3p + 4)/2.
The storage format is as follows:
prior[0], ..., prior[p-1] contain the elements of the vector μ0.
prior[p] contains the parameter τ.
prior[p+1] contains the parameter m.
The remaining positions are occupied by the upper-triangular part of the inverted matrix Λ-1.
If you provide no prior parameters, the editor uses their default values:
The array of p zeros is used as μ0.
τ is set to 0.
m is set to p.
The zero matrix is used as an initial approximate of Λ-1.
The EditMissingValues editor returns m sets of imputed values and/or a sequence of parameter estimates drawn during the DA procedure.
The editor returns the imputed values as the simul_missing_vals array. The size of the array should be sufficient to hold m sets each of the missing_vals_num size, that is, at least m*missing_vals_num in total. The editor packs the imputed values one by one in the order of their appearance in the matrix of observations.
For example, consider a task of dimension 4. The total number of observations n is 10. The second observation vector misses variables 1 and 2, and the seventh observation vector lacks variable 1. The number of sets to impute is m=2. Then, simul_missing_vals[0] and simul_missing_vals[1] contains the first and the second points for the second observation vector, and simul_missing_vals[2] holds the first point for the seventh observation. Positions 3, 4, and 5 are formed similarly.
To estimate convergence of the DA algorithm and choose a proper value of the number of DA iterations, request the sequence of parameter estimates that are produced during the DA procedure. The editor returns the sequence of parameters as a single array. The size of the array is
m*da_iter_num*(p+(p2+p)/2)
where
m is the number of sets of values to impute.
da_iter_num is the number of DA iterations.
The value p+(p2+p)/2 determines the size of the memory to hold one set of the parameter estimates.
In each set of the parameters, the vector of means occupies the first p positions and the remaining (p2+p)/2 positions are intended for the upper triangular part of the variance-covariance matrix.
Upon successful generation of m sets of imputed values, you can place them in cells of the data matrix with missing values and use the Summary Statistics routines to analyze and get estimates for each of the m complete datasets.
Intel® oneAPI Math Kernel Library implementation of the MI algorithm rewrites cells of the dataset that contain theVSL_SS_SNAN/VSL_SS_DNAN values. If you want to use the Summary Statistics routines to process the data with missing values again, mask the positions of the empty cells.
See additional details of the algorithm usage model in the Intel® oneAPI Math Kernel Library Summary Statistics Application Notes document [SS Notes].