Visible to Intel only — GUID: GUID-BD8AA239-5027-46E1-AD0D-BB1ECE28DF21
Visible to Intel only — GUID: GUID-BD8AA239-5027-46E1-AD0D-BB1ECE28DF21
Processing Data in Blocks
Summary Statistics enables block-based data analysis that can help you:
compute statistical estimates for out-of-memory datasets, splitting them into blocks
analyze in-memory data arrays that become available block by block
tune your applications for out-of-memory data support
To compute statistical estimates for out-of-memory datasets, do the following:
Set the estimates of your interest to zero, or to any other meaningful value:
for( i = 0; i < p; i++ ) { Xmean[i] = 0.0; Raw2Mom[i] = 0.0; Central2Mom[i] = 0.0; for(j = 0; j < p; j++) { Cov[i][j] = 0.0; } }
Initialize array W of size 2 with zero values.
This array holds accumulated weights that are important for correct computation of the estimates:
W[0] = 0.0; W[1] = 0.0;
Get the first portion of the dataset into array X, and the corresponding weights into array weights:
GetNextDataChunk( X, weights );
Follow the common usage model of the Summary Statistics algorithms:
/* Create a task */ xstorage = VSL_SS_MATRIX_STORAGE_COLS; errcode = vsldSSNewTask( &task, &p, &nblock, &xstorage, X, weights, indices ); /* Edit the task parameters */ errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W ); errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation ); errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 ); covstorage = VSL_SS_MATRIX_STORAGE_FULL; errcode = vsldSSEditCovCor( task, Xmean, cov, &covstorage, 0, 0 ); /* Compute the estimates for the dataset split into chunks */ estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0; nchunk++; ) errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD ); If ( nchunk >= N ) break; GetNextDataChunk( X, weights ); } /* Deallocate task resources */ errcode = vslSSDeleteTask( &task );
Summary statistics domain also enables reading the next data block into a different array. The whole computation scheme remains the same. You just need to provide the address of this data block to the library:
double* NextXChunk[N]; estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0; nchunk++; ) { errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD ); If ( nchunk >= N ) break; GetNextDataChunk( NextXChunk, [nchunk], weights ); errcode = vsldSSEditTask( task, VSL_SS_ED_OBSERV, NextXChunk,[nchunk] ); }
For the list of estimators that support processing datasets in blocks, see Table VS Summary Statistics Estimates Obtained with Compute Routine in the Summary Statistics section of [MKLMan].
Product and Performance Information |
---|
= = = = = = = = = = Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 = = = = = = = = = = |