Modin*
Scale your pandas workflows by changing a single line of code.
Accelerate pandas DataFrame Processing
Modin* is a drop-in replacement for pandas, enabling data scientists to scale to distributed DataFrame processing without having to change API code. Beginning with the 2024.2 release of AI Tools, Intel upstreams all optimizations to open source Modin.
Using this library, you can:
- Process terabytes of data on a single workstation
- Scale from a single workstation to the cloud using the same code
- Focus more on data analysis and less on learning new APIs
Modin is part of the end-to-end suite of Intel® AI and machine learning development tools and resources.
Download as Part of the Toolkit
Modin is available in the AI Tools Selector, which provides accelerated machine learning and data analytics pipelines with optimized deep learning frameworks and high-performing Python* libraries.
Download the Stand-Alone Version
A stand-alone download of Modin is available. You can install it using a package manager or build from the source.
Features
Accelerated DataFrame Processing
- Speed up the extract, transform, and load (ETL) process for large DataFrames.
- Automatically use all of the processing cores available on your machine.
Optimized for Intel Hardware
- Scale to terabytes of data on a single data science workstation.
- Analyze large datasets (over one billion rows) using performant end-to-end analytics frameworks that take advantage of the compute power for current and future Intel hardware.
Compatible with Existing APIs and Engines
- Change one line of code to use your existing pandas API calls, no matter the scale. Instead of import pandas as pd use import modin.pandas as pd
- Use Ray, Dask*, or Message Passing Interface (MPI) compute engines to distribute the data without having to write code.
- Continue to use the rest of your Python ecosystem code, such as NumPy, XGBoost, and scikit-learn*.
- Use the same notebook to scale from your local machine to the cloud.
Demos
Note Some articles and samples may still refer to compute engines that are no longer supported. Starting with Modin 0.31.0, the supported compute engines are Ray, Dask, and MPI.
Use Case: Fraud Detection
Follow this step-by-step tutorial to learn how to use Modin to preprocess, analyze, and transform a credit card transaction dataset for use in a fraud detection application.
Seamlessly Scale pandas Workloads with a Single Code-Line Change
Learn how Modin scales pandas workloads using the same APIs, with a live demonstration that walks you through the tools and process.
Build an End-to-End Machine Learning Workflow Using Modin and scikit-learn*
Follow along with code snippets to build and run an end-to-end machine learning workload using Modin and Intel® Extension for Scikit-learn* with US census data from 1970 to 2010.
Scale Your pandas Workflow with Modin
Data scientists no longer have to learn new APIs and rewrite code when their datasets require parallel processing or terabytes of data. See benchmark results that show speedup results for a variety of datasets.
In the News
Modin Reaches 10 Million Downloads
The Modin community has scaled rapidly due to the library's ability to speed processing of large datasets and ease in getting started. Hear this perspective from a production user's point of view. (Note that as of June 2024, Modin has surpassed 20 million downloads.)
Scale Interactive Data Science with Modin and Ray
Learn about the technology that underpins the ability of Modin to scale, how to apply Modin in practice, and how it compares to alternative solutions.
Documentation & Code Samples
Specifications
Processors:
- Intel® Core™ processors
- Intel® Xeon® processors
Operating systems:
- Linux*
- Windows*
Languages:
- Python
Get Help
Your success is our success. Access this support resource when you need assistance.
For additional help, see our general oneAPI Support.
Related Products
Stay Up to Date on AI Workload Optimizations
Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows. Take a chance and subscribe. You can change your mind at any time.