Accelerate End-to-End AI Pipelines Using the Intel® AI Analytics Toolkit
Accelerate End-to-End AI Pipelines Using AI Tools
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
Managing data across AI pipelines is challenging with the need to maximize performance everywhere. AI Tools provide developers, data scientists, and researchers with a comprehensive set of tools that help accelerate end-to-end data science and machine learning pipelines on Intel® architecture. It provides:
- High-performance machine and deep learning
- Fast and accurate training and inference
- Ability to scale up and scale out seamlessly
- Interoperability with Intel's latest optimizations in a single integrated package
- Out-of-the-box performance with just changing one line of code for scikit-learn* and Modin* (no code changes are needed for other components)
As a result, you see real-world results with up to 59x faster end-to-end workloads,1 2.4x faster training, and 3x faster TensorFlow* workloads.2
1. Configuration and workload details: Intel® oneAPI Data Analytics Library (oneDAL) 2021.1, scikit-learn* 0.23.1, Intel® Distribution for Python* 3.8; Intel® Xeon® Platinum 8280L processor at 2.7 GHz, 2 sockets, 28 cores per socket, 10M samples, 10 features, 100 clusters, 100 iterations, float32 data type. May not reflect all publicly available security updates. Testing date: Oct 23, 2020.Learn More
2. Q=query. The dataset of up to 1.1 billion individual taxi trips in New York City (NYC) from January 2009 through June 2015, covering yellow and green taxis. The NYC taxi workload ingests the large dataset into a data frame and queries them. Configuration and workload details for 20 million rows: dual-socket Intel Xeon Platinum 8280L processors (S2600WFT platform), 28 cores per socket, hyperthreading: on, turbo: on, non-uniform memory access (NUMA) nodes per socket = 2. BIOS: SE5C620.86B.02.01.0013.121520200651, kernel: 5.4.0-65-generic, microcode: 0x4003003, operating system: Ubuntu* 20.04.1 LTS, CPU governor: performance, transparent huge pages: enabled, system double data rate (DDR) memory configuration (slots, cap, speed): 12 slots, 32 GB, 2933 MHz, total memory per node: 384 GB DDR RAM, boot drive: INTEL SSDSC2BB800G7. 1 billion rows: Dual-socket Intel Xeon Platinum 8260M processor, 24 cores per socket, 2.50 GHz base frequency, DRAM memory: 384 GB 12 x 32 GB DDR4 Samsung* at 2666 MT/s 1.2V, Intel® Optane™ memory: 3 TB, 12 x 256 GB at 2666 MT/s, kernel: 4.15.091-generic, operating system: Ubuntu 20.04.4. May not reflect all publicly available security updates. Testing date: February 19, 2021. Learn More
3. Hardware support varies by individual tool. Architecture support is expanded over time.
4. Software and workloads. Python: 3.7.9 on CPU and 3.7.9 on GPU, scikit-learn: Sklearn 0.24.1 on CPUs, Intel® Extension for Scikit-learn*: 2021.2.2 on CPUs, NVIDIA RAPIDS*: 0.17 on GPUs; NVIDIA CUDA Toolkit*: 11.0.221 on GPUs. Platform 1: 3rd generation Intel Xeon Platinum 8380 processor, 2 sockets, 40 cores per socket; HT: on; Turbo: on; RAM: 512 GB (16 slots, 32 GB, 3200 MHz). Testing as of March 2021. Platform 2: 2nd generation Intel Xeon Platinum 8280L processor, 2 sockets, 28 cores per socket, hyperthreading: on, turbo: on, RAM: 384 GB (12 slots, 32 GB, 2933 MHz). Testing as of February 2021.
5. Configuration details: 2x Intel Xeon Platinum 8280L processor at 28 cores, operating system: Ubuntu 20.04.1LTS Mitigated, 384 GB RAM (384 GB RAM: 12x 32 GB 2,933 MHz), kernel: 5.4.0-65-generic, microcode: 0x4003003, CPU governor: performance, software: scikit-learn 0.24.1 accelerated by daal4py 202L2, Modin* 0.8.3, OmiSciDB v5.4.1, Python 3.9.7. Census data (21721922, 45): Dataset is from IPUMS USA, University of Minnesota (Sten Rugles, Sara Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Hose Pacas, and Mathew Sabek. IPUMS USA: V10.0 [dataset]). Learn More
6. Configuration details: 2x Intel Xeon Platinum 8280L processor at 28 cores, operating system: Ubuntu 20.04.1 LTS mitigated, 384 GB RAM: 12 x 32 GB, 2,933 MHz, kernel: 5.4.0-65-generic, microcode: 0x4003003, CPU governor: performance, software: scikit-learn 0.24.1, pandas 1.2.2, XGBoost 13.3, Python 3.9.7. PLAsTiCC data training set: (1421705, 6), Test set: (189022127, 6); dataset is from the Kaggle challenge PLAsTiCC Astronomical Classification and may not reflect all publicly available security updates. Testing date: February 19, 2021. Learn More
7. Configuration details: Dual-socket Intel Xeon Platinum 8280L processor, 28 cores, hyperthreading: on, turbo: on, total memory: 256 GB, system BIOS: SE5C620.86B.02.01.0012.070720200218, TensorFlow version: 2.5RC3, compiler and libraries: GNU Compiler Collection (GCC)* 7.5.0, oneAPI Deep Neural Network Library (oneDNN) v2.2.0, data type: FP32. Testing date: May 9, 2021. Learn More
8. Konfoong Biotech International Accelerates Tuberculosis Detection with AI
9. AI-Based Solution Helps Accelerate the Diagnosis of Lung Diseases
10. Testing conducted by AsiaInfo*, internal Intel tests, and Beijing Telecom, 2020. Target hardware platform enabling: SMG target hardware: Intel® Xeon® platform over a telecom carrier cloud (CLX 4214, 5218, 6248, 6230 deployed in a Beijing telecom). AsiaInfo test: Intel® Xeon® Gold 5218 processor at 2.30 GHz, 2 x 16 core, and compared to the Intel® Xeon® E5-2650 processor at 2.20 2 x 12 core. Intel internal test: Intel Xeon 6252 processor at 2.10 GHz, 48-core single machine and the Intel Xeon Gold 8280 processor, 4-node cluster. Performance numbers based on an AsiaInfo test: Intel Xeon Gold 5218 processor at 2.30 GHz, 2 x 16 core. Software and tools enablement: Intel® Distribution for Python* and XGBoost for Intel Xeon platform optimization for the machine learning workload (AI Kit 2020 Gold). Big data analysis pipeline solution: data source > data distribution > machine learning algorithm with Apache Spark* v2.4.3, Apache Hadoop* v2.7, Scala v2.12 in addition to Analytics Zoo v0.81 and XGBoost v0.9 and v1.10.
11. Reveal Hidden Possibilities
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Performance varies by use, configuration, and other factors. Learn more at intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See the backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.
You May Also Like
Related Articles