Introduction
The Using Intel oneMKL with R article discusses building R with the Intel® oneAPI Math Kernel Library (oneMKL) BLAS and LAPACK to improve the performance of those parts of R that rely on matrix computations. Since the BLAS and LAPACK routines are a long-standing set of functions with a standard API, the R build scripts have a built-in facility to use these. Setting R to use oneMKL can provide significant performance improvement on parts of R, but there are other functions in oneMKL that can provide great performance as well. This article will give you a quick tutorial on how to extend R by writing a wrapper for a function in oneMKL and call it from your R script.
The wrapper
Let's begin by selecting a simple function in oneMKL that is not a part of the BLAS or LAPACK. For this exercise we'll choose the vdPow function which takes two arrays, x and y, and creates a new array z whose nth element is x[n] to the y[n] power.
#include "R.h"
#include "Rdefines.h"
#include "Rinternals.h"
#include "mkl.h"
SEXP mkl_vdpow(SEXP N, SEXP X, SEXP Y) {
SEXP Z;
int *n;
int nn;
double *x, *y, *z;
/* Convert input arguments to C types */
PROTECT(N = AS_INTEGER(N));
PROTECT(X = AS_NUMERIC(X));
PROTECT(Y = AS_NUMERIC(Y));
n = INTEGER_POINTER(N);
x = NUMERIC_POINTER(X);
y = NUMERIC_POINTER(Y);
/* Allocate memory to store results */
nn = *n;
PROTECT(Z = NEW_NUMERIC(nn));
z = NUMERIC_POINTER(Z);
vdPow(nn, x, y, z);
UNPROTECT(4);
return Z;
}
Building the wrapper
On Windows this is easily done by obtaining the Windows package and Rtools from CRAN. I downloaded R 4.4.1 and Rtools version 4.4 and then created a file to set my environment as follows:
SET "MINGW_ROOT=C:\rtools44\x86_64-w64-mingw32.static.posix"
SET "PATH=%MINGW_ROOT%\bin;%MINGW_ROOT%\libexec\gcc\x86_64-w64-mingw32.static.posix\13.2.0;%PATH%"
"C:\Program Files (x86)\Intel\oneAPI\compiler\2024.2\env\vars.bat"
"C:\Program Files (x86)\Intel\oneAPI\mkl\2024.2\env\vars.bat"
In the above instructions, the compiler environment variables are sourced for the OpenMP* runtime.
I then build using the following commands:
SET "R_HOME=C:\Program Files\R\R-4.4.1"
SET "PATH=%R_HOME%\bin\x64;%PATH%"
gcc -c -O2 -Wall -I"%R_HOME%\include" -I"%MKLROOT%\include" pow_wrp.c -o pow_wrp.o
gcc -shared -Wl,-soname,pow_wrp.so -o pow_wrp.so pow_wrp.o "%R_HOME%\bin\x64\R.dll" "%MKLROOT%\lib\mkl_rt.lib"
Calling the function from R
Here then is a simple script that calls the function we've just built
dyn.load("c:/full_path_or_directory_on_path/pow_wrp.so") mkl_pow <- function(n, x, y) .External("mkl_vdpow", n, x, y) n <- 1000 x <- runif(n, min=2, max=10) y <- runif(n, min=-2, max=-1) z <- mkl_pow(n, x, y)
Performance
Let's see what sort of performance difference the use of Intel® oneMKL can make by comparing this new power function with the simple means available within R. To do the similar operation in R we could use a loop to perform each scalar operation. The following is the script running with Rscript on my laptop (Intel® oneMKL 2024.2.0, R-4.4.1, 4-core Intel® Core™ i7-1187G7 processor (3.00GHz) 16GB RAM, Windows* 11):
dyn.load("c:/test/pow_R/pow_wrp.so") mkl_pow <- function(n, x, y) .Call("mkl_vdpow", n, x, y) n <- 1000000 x <- runif(n, min=2, max=10) y <- runif(n, min=-2, max=-1) start <- proc.time() z <- mkl_pow(n, x, y) end1 <- proc.time() - start end1 i <- n start <- proc.time() repeat{ z[i] <- x[i]^y[i] i <- i - 1 if (i==0) break() } end2 <- proc.time() - start end2
The output of the above script in the Command Prompt is as following,
C:\test\pow_R>Rscript test.R user system elapsed 0.00 0.00 0.01 user system elapsed 0.42 0.00 0.22
For this very large problem size the oneMKL function makes a large difference. The loop approach takes 0.22 seconds while the oneMKL function takes 0.01 seconds. For smaller problems the performance will be less, but for systems with higher core-counts, compute intensive functions can show even greater speed-up.
Given the potential for large performance improvements, it may be worthwhile finding out the functions available in oneMKL.