By Rachel Oberman
Please check out the official documentation for the latest updates.
About Modin*
Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive with the tools that they love in a single line code change with exclusive optimizations for Intel hardware. This library is fully compatible with the Pandas API.
For more information on the purpose and functionality of the Modin package, please refer to the Modin documentation.
Supported Installation Options
Install via Individual Component
There are multiple options to install Modin from Anaconda.
Linux*, Windows*, and MacOS* are supported (x86 architecture only) - see more details on the Modin Installation Guide.
Install from Anaconda:
- Recommended Installation Call:
conda install -c conda-forge modin-all- Installs all available backends
Package Name in Intel® Channel | Engine(s) | Supported OSs |
---|---|---|
modin-all (recommended) | Dask*, Ray*, MPI | Linux* |
modin-ray (stable backend) | Ray* | Linux*, Windows* |
modin | Dask* | Linux*, Windows*, MacOS* |
modin-dask | Dask* | Linux*, Windows*, MacOS* |
PyPI Installation
To build Modin from source, view the “PyPI” instructions from the relevant Modin documentation.
Build From Source
To build Modin from source, view the “Build From Source” instructions from the relevant Modin documentation.
Getting Started with Modin*: Sanity Check
Once Modin is installed, run the following command(s) to verify that the installation was successful and Modin optimizations are ready to be used.
Run the following command(s) in command line based on the Modin backend engine(s) that you installed:
Ray Engine
python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Ray'); df = pd.DataFrame([1]);print(df+1)"
Dask Engine
python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Dask'); df = pd.DataFrame([1]);print(df+1)"
Check Sanity Check Results
For each command, if Modin is properly installed, the following dataframe will be printed:
0
0 2
Configuring the Compute Engine
Once Modin is installed, you can run the following command(s) to set Modin to use the desired compute engine for your workload for distributing and optimizing Pandas API functions.
Ray Engine
There are a few ways to enable the Ray backend in Modin:
For Modin Versions Before 0.12:
- In your Python script with a few lines of code:
import modin.pandas as pd
import modin.config as cfg
cfg.Engine.put('Ray’) - Setting the following environment variables:
export MODIN_ENGINE=native
export MODIN_BACKEND=ray
For Modin Versions After 0.12:
- In your Python script with a few lines of code:
import modin.config as cfg
cfg.StorageFormat.put(‘ray’)
import modin.pandas as pd - Setting the following environment variable:
export MODIN_STORAGE_FORMAT=ray
Dask Engine
There are a few ways to enable the Dask backend in Modin:
- In your Python script with a few lines of code:
import modin.pandas as pd
import modin.config as cfg
cfg.Engine.put('Dask’) - Setting the following environment variables:
export MODIN_ENGINE=native
export MODIN_BACKEND=dask
For Modin Versions After 0.12:
- In your Python script with a few lines of code:
import modin.config as cfg
cfg.StorageFormat.put(‘dask’)
import modin.pandas as pd - Setting the following environment variable:
export MODIN_STORAGE_FORMAT=dask
If you have only installed a single compute engine with Modin, Modin will use that as the default engine and you can skip this step.
Support
If you have further questions or need support on your workload optimization, submit your queries to the Issues page.
Useful Resources
- Modin GitHub
- Modin Documentation
- Modin Getting Started Code Sample
- Anaconda Blog: Scale your pandas workflow with Modin – no rewrite required
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at Performance Index.