Modin* Getting Started Guide

ID 739197
Updated 6/27/2024
Version Latest
Public

author-image

By

By Rachel Oberman

Please check out the official documentation for the latest updates.

About Modin*

Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive with the tools that they love in a single line code change with exclusive optimizations for Intel hardware. This library is fully compatible with the Pandas API.

For more information on the purpose and functionality of the Modin package, please refer to the Modin documentation.

Supported Installation Options

Install via Individual Component

There are multiple options to install Modin from Anaconda.

Linux*, Windows*, and MacOS* are supported (x86 architecture only) - see more details on the Modin Installation Guide.

Install from Anaconda:

  • Recommended Installation Call:
    conda install -c conda-forge modin-all
    • ​Installs all available backends
Package Name in Intel® Channel Engine(s) Supported OSs
modin-all (recommended) Dask*, Ray*, MPI Linux*
modin-ray (stable backend) Ray* Linux*, Windows*
modin Dask* Linux*, Windows*, MacOS*
modin-dask Dask* Linux*, Windows*, MacOS*

PyPI Installation

To build Modin from source, view the “PyPI” instructions from the relevant Modin documentation.

Build From Source

To build Modin from source, view the “Build From Source” instructions from the relevant Modin documentation.

Getting Started with Modin*: Sanity Check

Once Modin is installed, run the following command(s) to verify that the installation was successful and Modin optimizations are ready to be used.
Run the following command(s) in command line based on the Modin backend engine(s) that you installed:

Ray Engine

python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Ray'); df = pd.DataFrame([1]);print(df+1)"

Dask Engine

python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Dask'); df = pd.DataFrame([1]);print(df+1)"

Check Sanity Check Results

For each command, if Modin is properly installed, the following dataframe will be printed:

   0
0 2

Configuring the Compute Engine

Once Modin is installed, you can run the following command(s) to set Modin to use the desired compute engine for your workload for distributing and optimizing Pandas API functions.

Ray Engine

There are a few ways to enable the Ray backend in Modin:

For Modin Versions Before 0.12:

  • In your Python script with a few lines of code:
    import modin.pandas as pd
    import modin.config as cfg
    cfg.Engine.put('Ray’)
  • Setting the following environment variables:
    export MODIN_ENGINE=native
    export MODIN_BACKEND=ray

​For Modin Versions After 0.12:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.StorageFormat.put(‘ray’)
    import modin.pandas as pd
  • Setting the following environment variable:
    export MODIN_STORAGE_FORMAT=ray

Dask Engine

There are a few ways to enable the Dask backend in Modin:

  • In your Python script with a few lines of code:
    import modin.pandas as pd
    import modin.config as cfg
    cfg.Engine.put('Dask’)
  • Setting the following environment variables:
    export MODIN_ENGINE=native
    export MODIN_BACKEND=dask

For Modin Versions After 0.12:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.StorageFormat.put(‘dask’)
    import modin.pandas as pd
  • Setting the following environment variable:
    export MODIN_STORAGE_FORMAT=dask

If you have only installed a single compute engine with Modin, Modin will use that as the default engine and you can skip this step.

Support

If you have further questions or need support on your workload optimization, submit your queries to the Issues page.

Useful Resources

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at Performance Index.