Visible to Intel only — GUID: GUID-6627011A-2A05-48D3-9F61-4E6EE195BC12
Visible to Intel only — GUID: GUID-6627011A-2A05-48D3-9F61-4E6EE195BC12
Distributed Processing
The distributed processing mode assumes that the data set R is split in nblocks blocks across computation nodes.
Parameters
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
algorithmFPType |
float |
The floating-point type that the algorithm uses for intermediate computations. Can be float or double. |
method |
fastCSR |
Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm. |
nFactors |
10 |
The total number of factors. |
fullNUsers |
0 |
The total number of users m. |
partition |
Not applicable |
A numeric table of size either that provides the number of input data parts or , where nblocks is the number of input data parts, and the i-th element contains the offset of the transposed i-th data part to be computed by the initialization algorithm. |
engine |
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally at the initialization step. |
To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for :
Step 1 - on Local Nodes
Input
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID |
Input |
---|---|
dataColumnSlice |
An numeric table with the part of the input data set. Each node holds rows of the full transposed input data set . The input should be an object of CSRNumericTable class. |
Output
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the factors of the items to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
User Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size that contains the value of the starting offset of the user factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
---|---|
partialModel |
The model with initialized item factors. The result can only be an object of the PartialModel class. |
outputOfInitForComputeStep3 |
A key-value data collection that maps components of the partial model to the local nodes. |
offsets |
A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node. |
outputOfStep1ForStep2 |
A key-value data collection of size nblocks that contains the parts of the input numeric table: j -th element of this collection is a numeric table of size , where and the values are defined by the partition parameter. |
Step 2 - on Local Nodes
Input
This step uses the results of the previous step.
Input ID |
Input |
---|---|
inputOfStep2FromStep1 |
A key-value data collection of size nblocks that contains the parts of the input data set: i -th element of this collection is a numeric table of size . Each numeric table in the collection should be an object of CSRNumericTable class. |
Output
In this step, implicit ALS initialization calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the user factors to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
Item Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size that contains the value of the starting offset of the item factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
---|---|
dataRowSlice |
An numeric table with the mining data. j-th node gets rows of the full input data set R. |
outputOfInitForComputeStep3 |
A key-value data collection that maps components of the partial model to the local nodes. |
offsets |
A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node. |