Introduction
The Using Intel oneMKL with R article discusses building R with the Intel® oneAPI Math Kernel Library (oneMKL) BLAS and LAPACK to improve the performance of those parts of R that rely on matrix computations. Since the BLAS and LAPACK routines are a long-standing set of functions with a standard API, the R build scripts have a built-in facility to use these. Setting R to use oneMKL can provide significant performance improvement on parts of R, but there are other functions in oneMKL that can provide great performance as well. This article will give you a quick tutorial on how to extend R by writing a wrapper for a function in oneMKL and call it from your R script.
The wrapper
Let's begin by selecting a simple function in oneMKL that is not a part of the BLAS or LAPACK. For this exercise we'll choose the vdPow function which takes two arrays, x and y, and creates a new array z whose nth element is x[n] to the y[n] power.
Building the wrapper
On Windows this is easily done by obtaining the Windows package and Rtools from CRAN. I downloaded R 4.4.1 and Rtools version 4.4 and then created a file to set my environment as follows:
In the above instructions, the compiler environment variables are sourced for the OpenMP* runtime.
I then build using the following commands:
Calling the function from R
Here then is a simple script that calls the function we've just built
dyn.load("c:/full_path_or_directory_on_path/pow_wrp.so") mkl_pow <- function(n, x, y) .External("mkl_vdpow", n, x, y) n <- 1000 x <- runif(n, min=2, max=10) y <- runif(n, min=-2, max=-1) z <- mkl_pow(n, x, y)
Performance
Let's see what sort of performance difference the use of Intel® oneMKL can make by comparing this new power function with the simple means available within R. To do the similar operation in R we could use a loop to perform each scalar operation. The following is the script running with Rscript on my laptop (Intel® oneMKL 2024.2.0, R-4.4.1, 4-core Intel® Core™ i7-1187G7 processor (3.00GHz) 16GB RAM, Windows* 11):
dyn.load("c:/test/pow_R/pow_wrp.so") mkl_pow <- function(n, x, y) .Call("mkl_vdpow", n, x, y) n <- 1000000 x <- runif(n, min=2, max=10) y <- runif(n, min=-2, max=-1) start <- proc.time() z <- mkl_pow(n, x, y) end1 <- proc.time() - start end1 i <- n start <- proc.time() repeat{ z[i] <- x[i]^y[i] i <- i - 1 if (i==0) break() } end2 <- proc.time() - start end2
The output of the above script in the Command Prompt is as following,
C:\test\pow_R>Rscript test.R user system elapsed 0.00 0.00 0.01 user system elapsed 0.42 0.00 0.22
For this very large problem size the oneMKL function makes a large difference. The loop approach takes 0.22 seconds while the oneMKL function takes 0.01 seconds. For smaller problems the performance will be less, but for systems with higher core-counts, compute intensive functions can show even greater speed-up.
Given the potential for large performance improvements, it may be worthwhile finding out the functions available in oneMKL.