Get Started with Intel® oneAPI Collective Communications Library
Intel® oneAPI Collective Communications Library (oneCCL) is a scalable and high-performance communication library for Deep Learning (DL) and Machine Learning (ML) workloads. It develops the ideas originated in Intel(R) Machine Learning Scaling Library and expands the design and API to encompass new features and use cases.
Before You Begin
Before you start using oneCCL, make sure to set up the library environment. There are two ways to set up the environment:
Using standalone oneCCL package installed into <ccl_install_dir>:
source <ccl_install_dir>/env/setvars.sh
Using oneCCL from Intel® oneAPI Base Toolkit installed into <toolkit_install_dir> (/opt/intel/inteloneapi by default):
source <toolkit_install_dir>/setvars.sh
System Requirements
Refer to the oneCCL System Requirements page.
Sample Application
The sample code below shows how to use oneCCL API to perform allreduce communication for SYCL USM memory.
#include <iostream> #include <mpi.h> #include "oneapi/ccl.hpp" void mpi_finalize() { int is_finalized = 0; MPI_Finalized(&is_finalized); if (!is_finalized) { MPI_Finalize(); } } int main(int argc, char* argv[]) { constexpr size_t count = 10 * 1024 * 1024; int size = 0; int rank = 0; ccl::init(); MPI_Init(nullptr, nullptr); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); atexit(mpi_finalize); sycl::default_selector device_selector; sycl::queue q(device_selector); std::cout << "Running on " << q.get_device().get_info<sycl::info::device::name>() << "\n"; /* create kvs */ ccl::shared_ptr_class<ccl::kvs> kvs; ccl::kvs::address_type main_addr; if (rank == 0) { kvs = ccl::create_main_kvs(); main_addr = kvs->get_address(); MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD); } else { MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD); kvs = ccl::create_kvs(main_addr); } /* create communicator */ auto dev = ccl::create_device(q.get_device()); auto ctx = ccl::create_context(q.get_context()); auto comm = ccl::create_communicator(size, rank, dev, ctx, kvs); /* create stream */ auto stream = ccl::create_stream(q); /* create buffers */ auto send_buf = sycl::malloc_device<int>(count, q); auto recv_buf = sycl::malloc_device<int>(count, q); /* open buffers and modify them on the device side */ auto e = q.submit([&](auto& h) { h.parallel_for(count, [=](auto id) { send_buf[id] = rank + id + 1; recv_buf[id] = -1; }); }); int check_sum = 0; for (int i = 1; i <= size; ++i) { check_sum += i; } /* do not wait completion of kernel and provide it as dependency for operation */ std::vector<ccl::event> deps; deps.push_back(ccl::create_event(e)); /* invoke allreduce */ auto attr = ccl::create_operation_attr<ccl::allreduce_attr>(); ccl::allreduce(send_buf, recv_buf, count, ccl::reduction::sum, comm, stream, attr, deps).wait(); /* open recv_buf and check its correctness on the device side */ sycl::buffer<int> check_buf(count); q.submit([&](auto& h) { sycl::accessor check_buf_acc(check_buf, h, sycl::write_only); h.parallel_for(count, [=](auto id) { if (recv_buf[id] != static_cast<int>(check_sum + size * id)) { check_buf_acc[id] = -1; } }); }); q.wait_and_throw(); /* print out the result of the test on the host side */ { sycl::host_accessor check_buf_acc(check_buf, sycl::read_only); size_t i; for (i = 0; i < count; i++) { if (check_buf_acc[i] == -1) { std::cout << "FAILED\n"; break; } } if (i == count) { std::cout << "PASSED\n"; } } sycl::free(send_buf, q); sycl::free(recv_buf, q); }
Prerequisites
oneCCL with SYCL support is installed and oneCCL environment is set up (see installation instructions)
Intel® MPI Library is installed and MPI environment is set up
Run the sample
Use the C++ driver with the -fsycl option to build the sample:
Linux* OS
icpx -fsycl -o sample sample.cpp -lccl -lmpi
Windows* OS
icx-cl -fsycl -o sample sample.cpp -lccl -lmpi
Run the sample:
mpiexec <parameters> ./sample
where <parameters> represents optional mpiexec parameters such as node count, processes per node, hosts, and so on.
Compile and build applications with pkg-config
The pkg-config tool is widely used to simplify building software with library dependencies. It provides command line options for compiling and linking applications to a library. Intel® oneAPI Collective Communications Library provides pkg-config metadata files for this tool starting with the oneCCL 2021.4 release.
The oneCCL pkg-config metadata files cover both configurations of oneCCL: with and without SYCL support.
Set up the environment
Set up the environment before using the pkg-config tool. To do this, use one of the following options (commands are given for a Linux install to the standard /opt/intel/oneapi location):
Intel(R) oneAPI Base Toolkit setvars.sh script:
source /opt/intel/oneapi/setvars.sh
oneCCL setvars.sh script (the prerequisites for this option are listed below):
source /opt/intel/oneapi/ccl/latest/env/setvars.sh
Prerequisites for the setup with oneCCL setvars.sh
To set up the environment with oneCCL setvars.sh script, you have to install additional dependencies in the environment:
Intel® MPI Library (for both configurations of oneCCL: with and without SYCL support)
Intel® oneAPI DPC++/C++ Compiler for oneCCL with SYCL support
Compile a program using pkg-config
To compile a test sample.cpp program with oneCCL, run:
icpx -o sample sample.cpp $(pkg-config --libs --cflags ccl-cpu_gpu_icpx)
--cflags provides the include path to the API directory:
pkg-config --cflags ccl-cpu_gpu_icpx
The output:
-I/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//include/ -I/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//include/cpu_gpu_icpx
--libs provides the oneCCL library name, all other dependencies (such as SYCL and MPI), and the search path to find it:
pkg-config --libs ccl-cpu_gpu_icpx
The output:
-L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/ -L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/release/ -L/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//lib/cpu_gpu_icpx -lccl -lsycl -lmpi -lmpicxx -lmpifort
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.