Parallel Universe Magazine Issue 46, October 2021

In this Issue:

Letter from the Editor: Let’s Talk about High-Performance Data Analytics by Henry A. Gabb, Senior Principal Engineer, Intel Corporation
Getting Started with Habana® Gaudi® for Deep Learning Training
A New Hardware and Software Stack for Deep Learning
Speeding Up the Databricks Runtime for Machine Learning
Intel-Optimized Artificial Intelligence in the Cloud
A Novel Scale-Out Training Solution for Deep Learning Recommender Systems
Demonstrating Better Parallel Scaling in the MLPerf Benchmark
Cost Matters: On the Importance of Cost-Aware Hyperparameter Optimization
Accounting for the Cost Metric that Actually Matters
Katana's High-Performance Graph Analytics Library
Python Programmers Have a New Option for Graph Analytics
Accelerate R Code with Intel® oneAPI Math Kernel Library
Improve R Performance without Modifying Code
Optimizing the Maxloc Operation Using Intel® AVX-512 Instructions
A Guide to Vectorizing this Common Reduction Operation

Recent issues of The Parallel Universe have emphasized oneAPI; namely, DPC++ and the component libraries like oneMKL. This issue focuses on data science; in particular, training machine learning and deep learning models. Our feature article, Getting Started with Habana Gaudi for Deep Learning Training, describes the Gaudi HPU (Habana Processor Unit) architecture and shows you how to use it. Speeding Up the Databricks Runtime for Machine Learning discusses Intel optimizations for doing artificial intelligence in the cloud. A Novel Scale-Out Training Solution for Deep Learning Recommender Systems presents the results of a recent collaboration with Facebook to improve the scalability of training. Finally, Cost Matters: On the Importance of Cost-Aware Hyperparameter Optimization presents the results of a recent collaboration with Facebook and Amazon to improve hyperparameter tuning.

From there, we look at another important part of the end-to-end data analytics pipeline: graph analytics. Intel has a long history in graph processing research and has active collaborations with many of the top practitioners, e.g.: the GraphBLAS specification, the LDBC Graphalytics benchmark, comprehensive graph analytics analyses, and the PIUMA architecture for efficient and scalable graph analysis. Data scientists have a great package, NetworkX, for graph and network analysis, but it’s not known for performance. Fortunately, our friends at Katana Graph just released a high-performance, parallel graph analytics library for Python programmers. Katana’s High-Performance Graph Analytics Library offers an alternative for compute-intensive operations on extremely large graphs.

The R programming language is popular with data scientists and statisticians, but like NetworkX, it’s not known for performance. Accelerate R Code with Intel® oneAPI Math Kernel Library shows you how to improve the performance simply by linking the R programming environment to oneMKL. No code changes are required.

We close this issue with a follow-up to a previous article on vectorization: Optimization of Scan Operations Using Explicit Vectorization. Optimizing the Maxloc Operation Using AVX-512 Vector Instructions is another how-to guide to using vector intrinsics to accelerate common kernels; in this case, the maxloc reduction.

As always, don’t forget to check out Tech.Decoded for more information on Intel solutions for code modernization, visual computing, data center and cloud computing, data science, systems and IoT development, and heterogeneous parallel programming with oneAPI.

Henry A. Gabb

October 202

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Parallel Universe Magazine - Issue 46, October 2021

The Parallel Universe Magazine

Product and Performance Information