Part 2: Techniques for Accelerating NumPy & SciPy
Intel® Distribution for Python* delivers significant speedups for NumPy and SciPy that help with computational package performance.
Hi. My name is Oleksandr. Today, I'll be talking about techniques used to accelerate performance of NumPy and SciPy in the Intel® Distribution for Python*.
NumPy and SciPy are of central importance for scientific and numerical computing. Enhancing their performance translates into improved performance of downstream computational packages. Intel Distribution for Python is [a] ready-made binary distribution available on Windows*, macOS*, and Linux*, optimized for performance on Intel® hardware. Having the distribution helps provide the optimized software to the customer faster.
The distribution is powered by Anaconda*. There is a conda* tarball for every package that comes with the distribution, and it contains the conda recipe used to build it, including source code patches and build scripts. However, the ultimate goal is for us to upstream these changes for the benefit of the entire community.
Optimizations include use of performance libraries, like the Intel® Math Kernel Library, to optimize [Basic Linear Algebra Subprograms] BLAS LAPACK operations, [fast Fourier transform] FFT computations, and random number generation. Optimizations also include the use of Intel® C and Fortran Compilers [sic] to enable better use of vectorization, specifically when applying universal functions to [INAUDIBLE]. This is further enhanced by the use of aligned memory allocation and threaded memory copying.
Using MKL [sic] gives significant performance advantages on the latest hardware thanks to MKL [sic] developers having prerelease access to it. Intel Distribution for Python exposes [a] Python interface to MKL's [sic] FFT functionality, enhancing NumPy.fft and SciPy.fft pack submodules.
The interface directly works with single and double precision NumPy arrays, offers full support for any strides eliminating the need to copy the input to a contiguous array, allows for in-place and out-of-place nodes, and natively supports multidimensional transforms.
The performance [of] NumPy's and SciPy's FFT in the Intel Distribution for Python approaches that of a native code. Universal functions offer a convenient way to apply the same transformation to every element of the argument array.
The Intel compiler [sic] was used to generate vector instructions for these loops with automatic runtime dispatching for different hardware architectures. The Intel compiler's [sic] Short Vector Math Library (SVML) allowed [it] to vectorize transcendental functions. Loops over large arrays were further threaded with the help of MKL's [sic] vector math library.
Vectorized transformations are more efficient when applied to elements of an aligned array. For example, performance of the training of the deep belief network key auto using Intel® Theano* [sic] has increased about 50 percent of the memory optimizations where applied. Furthermore, copying of large segments of memory has been threaded using MKL's [sic] BLAS service functions.
Intel Distribution for Python also comes with [an] MKL-based [sic] random number generation package, exposed as numpy.random_intel, which is a drop-in replacement for numpy.random, although their streams are not [INAUDIBLE] identical.
Not only does the package offer up to 60 times better performance at sampling, but it also exposes all MKL's [sic] algorithms for basic random number generation, including the family of Mersenne Twister algorithms, MT2203, and stateless algorithms.
To learn more about NumPy and SciPy in [the] Intel Distribution for Python, follow the links below. Thanks for watching.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.