Visible to Intel only — GUID: GUID-F8214457-87BB-4532-A26E-04E8089AD7D0
DPCT1000
DPCT1001
DPCT1002
DPCT1003
DPCT1004
DPCT1005
DPCT1006
DPCT1007
DPCT1008
DPCT1009
DPCT1010
DPCT1011
DPCT1012
DPCT1013
DPCT1014
DPCT1015
DPCT1016
DPCT1017
DPCT1018
DPCT1019
DPCT1020
DPCT1021
DPCT1022
DPCT1023
DPCT1024
DPCT1025
DPCT1026
DPCT1027
DPCT1028
DPCT1029
DPCT1030
DPCT1031
DPCT1032
DPCT1033
DPCT1034
DPCT1035
DPCT1036
DPCT1037
DPCT1038
DPCT1039
DPCT1040
DPCT1041
DPCT1042
DPCT1043
DPCT1044
DPCT1045
DPCT1046
DPCT1047
DPCT1048
DPCT1049
DPCT1050
DPCT1051
DPCT1052
DPCT1053
DPCT1054
DPCT1055
DPCT1056
DPCT1057
DPCT1058
DPCT1059
DPCT1060
DPCT1061
DPCT1062
DPCT1063
DPCT1064
DPCT1065
DPCT1066
DPCT1067
DPCT1068
DPCT1069
DPCT1070
DPCT1071
DPCT1072
DPCT1073
DPCT1074
DPCT1075
DPCT1076
DPCT1077
DPCT1078
DPCT1079
DPCT1080
DPCT1081
DPCT1082
DPCT1083
DPCT1084
DPCT1085
DPCT1086
DPCT1087 [UPDATE]
DPCT1088
DPCT1089
DPCT1090
DPCT1091
Message
Detailed Help
Suggestions to Fix
DPCT1092
DPCT1093
DPCT1094
DPCT1095
DPCT1096
DPCT1097
DPCT1098
DPCT1099
DPCT1100
DPCT1101
DPCT1102
DPCT1103
DPCT1104
DPCT1105
DPCT1106
DPCT1107
DPCT1108
DPCT1109
DPCT1110
DPCT1111
DPCT1112
DPCT1113
DPCT1114
DPCT1115
DPCT1116
DPCT1117
DPCT1118
DPCT1119
DPCT1120
DPCT1121
DPCT1122
DPCT1123
DPCT1124
DPCT1125
DPCT1126
DPCT1127
DPCT1128
DPCT1129
DPCT1130
DPCT1131
DPCT1132
DPCT2001
DPCT3000
DPCT3001
DPCT3002
Visible to Intel only — GUID: GUID-F8214457-87BB-4532-A26E-04E8089AD7D0
DPCT1091
Message
The function dpct::segmented_reduce only supports DPC++ native binary operation. Replace “dpct_placeholder” with a DPC++ native binary operation.
Detailed Help
dpct::segmented_reduce supports the following native binary operations:
sycl::plus
sycl::bit_or
sycl::bit_xor
sycl::bit_and
sycl::maximum
sycl::minimum
sycl::multiplies
Suggestions to Fix
Review and rewrite the code manually.
For example, this original CUDA* code:
struct UserMin {
template <typename T>
__device__ __host__ __forceinline__ T operator()(const T &a,
const T &b) const {
return (b < a) ? b : a;
}
};
void foo(int num_segments, int *device_offsets, int *device_in, int *device_out,
UserMin min_op, int initial_value) {
size_t temp_storage_size;
void *temp_storage = nullptr;
cub::DeviceSegmentedReduce::Reduce(temp_storage, temp_storage_size, device_in,
device_out, num_segments, device_offsets,
device_offsets + 1, min_op, initial_value);
cudaMalloc(&temp_storage, temp_storage_size);
cub::DeviceSegmentedReduce::Reduce(temp_storage, temp_storage_size, device_in,
device_out, num_segments, device_offsets,
device_offsets + 1, min_op, initial_value);
cudaDeviceSynchronize();
cudaFree(temp_storage);
}
results in the following migrated SYCL code:
struct UserMin {
template <typename T>
__dpct_inline__ T operator()(const T &a, const T &b) const {
return (b < a) ? b : a;
}
};
void foo(int num_segments, int *device_offsets, int *device_in, int *device_out,
UserMin min_op, int initial_value) {
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.in_order_queue();
/*
DPCT1026:0: The call to cub::DeviceSegmentedReduce::Reduce was removed because
this call is redundant in SYCL.
*/
/*
DPCT1092:1: Consider replacing work-group size 128 with different value for
specific hardware for better performance.
*/
/*
DPCT1091:2: The function dpct::segmented_reduce only supports DPC++ native
binary operation. Replace "dpct_placeholder" with a DPC++ native binary
operation.
*/
dpct::device::segmented_reduce<128>(
q_ct1, device_in, device_out, num_segments, device_offsets,
device_offsets + 1, dpct_placeholder, initial_value);
dev_ct1.queues_wait_and_throw();
}
which is rewritten to:
void foo(int num_segments, int *device_offsets, int *device_in, int *device_out,
UserMin min_op, int initial_value) {
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.in_order_queue();
int max_work_group_size = dev_ct1.get_max_work_group_size();
if (max_work_group_size >= 256)
dpct::device::segmented_reduce<256>(
q_ct1, device_in, device_out, num_segments, device_offsets,
device_offsets + 1, sycl::minimum(), initial_value);
else
dpct::device::segmented_reduce<128>(
q_ct1, device_in, device_out, num_segments, device_offsets,
device_offsets + 1, sycl::minimum(), initial_value);
dev_ct1.queues_wait_and_throw();
}