Get Started with Intel® oneAPI Collective Communications Library
Intel® oneAPI Collective Communications Library (oneCCL) is a scalable and high-performance communication library for Deep Learning (DL) and Machine Learning (ML) workloads. It develops the ideas originated in Intel(R) Machine Learning Scaling Library and expands the design and API to encompass new features and use cases.
System Requirements
Refer to the oneCCL System Requirements page.
Install
See Intel® oneAPI Toolkits Installation Guide for Linux* OS to learn about oneCCL installation.
Before You Begin
After installing oneCCL, set the environment variables:
To load oneCCL package, run:
source <install_dir>/ccl/latest/env/vars.sh
To load all installed oneAPI components, run:
source <install_dir>/setvars.sh
You can also modify the oneCCL setup by using two flags when sourcing the vars.sh script:
ccl-configuration=[cpu_gpu_dpcpp/cpu] - allows to choose between a SYCL-based version represented by cpu_gpu_dpcpp (default) and a CPU version, which does not require SYCL runtime libraries.
ccl-bundled-mpi=[yes|no] - controls if Intel(R) MPI is used or not. Default value is yes.
To use Intel(R) MPI, run:
source intel/oneapi/ccl/2021.11/env/vars.sh --ccl-bundled-mpi=yes
oneCCL uses bundled IMPI implementation, possibly overriding a user-supplied setup.
To use MPI implementation different from Intel(R) MPI, such as MPICH, run:
source intel/oneapi/ccl/2021.11/env/vars.sh --ccl-bundled-mpi=no
For more information about setvars.sh, see Use the setvars and oneapi-vars Scripts with Linux*.
Sample Application
The sample code below shows how to use oneCCL API to perform allreduce communication for SYCL USM memory.
#include <iostream>
#include <mpi.h>
#include "oneapi/ccl.hpp"
void mpi_finalize() {
int is_finalized = 0;
MPI_Finalized(&is_finalized);
if (!is_finalized) {
MPI_Finalize();
}
}
int main(int argc, char* argv[]) {
constexpr size_t count = 10 * 1024 * 1024;
int size = 0;
int rank = 0;
ccl::init();
MPI_Init(nullptr, nullptr);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
atexit(mpi_finalize);
sycl::default_selector device_selector;
sycl::queue q(device_selector);
std::cout << "Running on " << q.get_device().get_info<sycl::info::device::name>() << "\n";
/* create kvs */
ccl::shared_ptr_class<ccl::kvs> kvs;
ccl::kvs::address_type main_addr;
if (rank == 0) {
kvs = ccl::create_main_kvs();
main_addr = kvs->get_address();
MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD);
}
else {
MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD);
kvs = ccl::create_kvs(main_addr);
}
/* create communicator */
auto dev = ccl::create_device(q.get_device());
auto ctx = ccl::create_context(q.get_context());
auto comm = ccl::create_communicator(size, rank, dev, ctx, kvs);
/* create stream */
auto stream = ccl::create_stream(q);
/* create buffers */
auto send_buf = sycl::malloc_device<int>(count, q);
auto recv_buf = sycl::malloc_device<int>(count, q);
/* open buffers and modify them on the device side */
auto e = q.submit([&](auto& h) {
h.parallel_for(count, [=](auto id) {
send_buf[id] = rank + id + 1;
recv_buf[id] = -1;
});
});
int check_sum = 0;
for (int i = 1; i <= size; ++i) {
check_sum += i;
}
/* do not wait completion of kernel and provide it as dependency for operation */
std::vector<ccl::event> deps;
deps.push_back(ccl::create_event(e));
/* invoke allreduce */
auto attr = ccl::create_operation_attr<ccl::allreduce_attr>();
ccl::allreduce(send_buf, recv_buf, count, ccl::reduction::sum, comm, stream, attr, deps).wait();
/* open recv_buf and check its correctness on the device side */
sycl::buffer<int> check_buf(count);
q.submit([&](auto& h) {
sycl::accessor check_buf_acc(check_buf, h, sycl::write_only);
h.parallel_for(count, [=](auto id) {
if (recv_buf[id] != static_cast<int>(check_sum + size * id)) {
check_buf_acc[id] = -1;
}
});
});
q.wait_and_throw();
/* print out the result of the test on the host side */
{
sycl::host_accessor check_buf_acc(check_buf, sycl::read_only);
size_t i;
for (i = 0; i < count; i++) {
if (check_buf_acc[i] == -1) {
std::cout << "FAILED\n";
break;
}
}
if (i == count) {
std::cout << "PASSED\n";
}
}
sycl::free(send_buf, q);
sycl::free(recv_buf, q);
}
Prerequisites
oneCCL with SYCL support is installed and oneCCL environment is set up (see installation instructions)
Intel® MPI Library is installed and MPI environment is set up
Run the Sample
Use the C++ driver with the -fsycl option to build the sample:
icpx -fsycl -o sample sample.cpp -lccl -lmpi
Run the sample:
mpiexec <parameters> ./sample
Where <parameters> represents optional mpiexec parameters, such as node count, processes per node, hosts, and so on.
Compile and Build Applications with pkg-config
The pkg-config tool is widely used to simplify building software with library dependencies. It provides command line options for compiling and linking applications to a library. Intel® oneAPI Collective Communications Library provides pkg-config metadata files for this tool starting with the oneCCL 2021.4 release.
The oneCCL pkg-config metadata files cover both configurations of oneCCL: with and without SYCL support.
Compile
To compile a test sample.cpp program with oneCCL, run:
icpx -fsycl -o sample sample.cpp $(pkg-config --libs --cflags ccl)
--cflags provides the include path to the API directory:
pkg-config --cflags ccl
The output:
-I/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//include/ -I/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//include/cpu_gpu_icpx
--libs provides the oneCCL library name, all other dependencies (such as SYCL and MPI), and the search path to find it:
pkg-config --libs ccl
The output:
-L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/ -L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/release/ -L/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//lib/cpu_gpu_icpx -lccl -lsycl -lmpi -lmpicxx -lmpifort
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.