Visible to Intel only — GUID: GUID-EF34DA01-70B6-444B-8B69-D50AC6E7379A
Visible to Intel only — GUID: GUID-EF34DA01-70B6-444B-8B69-D50AC6E7379A
DPCT1018
Message
The <API name> was migrated, but due to <reason>, the generated code performance may be sub-optimal.
Detailed Help
This warning appears in the following cases:
Migration of the cublasSetMatrix function. Intel® DPC++ Compatibility Tool replaced the cublasSetMatrix with memory copying from the host to the device. When the rows parameter of the cublasSetMatrix is smaller than the lda parameter, the generated code copies more data (lda*cols) than the actual data available in the matrix (rows*cols).
To improve performance, consider changing the values of lda and ldb. If the rows parameter is greater than or equal to lda, no action is required for this code.
Migration of the cublasSetVector function. Intel® DPC++ Compatibility Tool replaced the cublasSetVector with memory copying from the host to the device. When the incx parameter of the cublasSetVector equals the incy parameter, but is greater than 1, the generated code copies more data (incx*n) than the actual data available in the vector (n). To improve performance, consider changing the values of incx and incy.
Suggestions to Fix
If the rows parameter of the cublasSetMatrix is smaller than the lda parameter and you observe performance issues, consider changing the values of lda and ldb.
If the incx parameter of the cublasSetVector equals the incy parameter, but is greater than 1 and you observe performance issues, consider changing the values of incx and incy.
For example, this original CUDA* code:
void foo() {
const int element_num = 128;
const int h_inc = 128;
const int d_inc = 128;
cublasSetVector(element_num, sizeof(float), data, h_inc, d_data, d_inc);
}
results in the following migrated SYCL* code:
void foo() {
const int element_num = 128;
const int h_inc = 128;
const int d_inc = 128;
/*
DPCT1018:0: The cublasSetVector was migrated, but due to parameter h_inc
equals to parameter d_inc but greater than 1, the generated code performance
may be sub-optimal.
*/
dpct::matrix_mem_copy((void *)d_data, (void *)data, d_inc, h_inc, 1,
element_num, sizeof(float));
}
which is rewritten to:
void foo() {
const int element_num = 128;
//Save the data in d_data continuously and change h_inc and d_inc from 128 to 1.
const int h_inc = 1;
const int d_inc = 1;
// Now there is no padding between each element, so memcpy can be used directly.
dpct::get_default_queue().memcpy(d_data, data, sizeof(float) * element_num).wait();
}