Get Started with Intel® Optane™ DC Persistent Memory
Learn how to manage your Intel® Optane™ DC persistent memory modules with open source tools like ipmctl and ndctl!
Hello. My name is Kelly Lyon from Intel. In this video, we learn how to manage your Intel® Optane™ DC persistent memory modules with open source tools like ipmctl and ndctl. This includes discovering and verifying the topology, configuration, settings, and capacities of your modules, provisioning persistent memory and memory mode or app direct mode, and creating regions and namespaces to expose persistent memory to applications.
If you haven't yet installed persistent memory modules on your server, check out this video provided in the links, then continue your setup with these software tools. So what are ipmctl and ndctl? Though there is some overlap in functionality, there are a few key differences to be aware of. For instance, ipmctl can manage goals and regions and see performance metrics for the persistent memory modules.
This functionality, which is a feature specific to Intel® persistent memory, is necessary for selecting which operating mode to use. More information about operating modes can be found in the video "Persistent Memory Operating Modes Explained," made by my co-worker, Eduardo. Both ipmctl and ndctl are open source and are available on Linux*. ipmctl works on Windows too, but ndctl does not since there are similar functionalities already built into Windows* PowerShell.
Before we get started, you will need a Linux distribution with kernel 4.2 or later. If you are specifically interested in reliability, availability, and serviceability features, we recommend Linux kernel 4.19 or later. A link to the ipmctl binary is provided in the video description, or you can build from source, which can be found on the ipmctl GitHub page. You can also build ndctl tell from source or use binaries which are already included in popular Linux package distributions.
We should briefly cover a couple basic persistent memory concepts. A region or interlude set is a grouping of one or more persistent memory modules. Regions can be created as either non-interleaved, meaning one region per persistent memory module, or interleaved, which creates one large region over all modules in a CPU socket. Many users choose to have one fully interleaved region across their memory modules because this allows for increased bandwidth performance.
Regions can be created or modified using ipmctl or via an option in the BIOS. Regions can be divided up into one or more namespaces. A namespace defines a continuously addressed range of non-volatile memory conceptually similar to a hard disk partition, SCSI logical unit, or an NVM express namespace. A namespace is the unit of persistent memory storage that appears in the /dev directory as a device which can be used for input and output. Intel recommends using the ndctl tool for creating namespaces.
Now let's get started provisioning our hardware. We'll start by demonstrating how to set configuration goals and regions with ipmctl. Then we'll use ndctl to create namespaces within these regions. First we'll use the ipmctl show topology command to display our total memory resources available. The examples shown in this video are from a system that has two sockets fully populated with 12 16-gigabyte DDR 4 dimms and 12 128-gigabyte Intel Optane DC persistent memory modules.
We can quickly check the total capacity, health, and firmware versions of our persistent memory modules with the show dimm command. Now, to provision Intel Optance DC persistent memory modules, we must first use ipmctl to define a goal which specifies if the modules are to be used in memory mode, app direct mode, or mixed mode. The goal is stored on the persistent memory modules for the BIOS to read on the next reboot.
The default call to ipmctl sets the goal to app direct with a fully interleaved region. That is to say the following two commands are equivalent. Whenever setting a new goal, the output will look similar to what is displayed here. There will be a preview of what will be applied. You must type y to acknowledge and continue. Then it prints out the goal with the note at the bottom stating that in order for it to take effect, you must first reboot.
To specify a goal of app direct with non interleaved regions, use the flag persistent memory type equals app direct not interleaved. When provisioning capacity for memory mode, you can specify what percentage of the total space will be used in this mode. The first example shows 100% of the capacity being set for memory mode. If you specify any value less than 100%, then it is actually provisioning your persistent memory into mixed mode. In this example, we show 60% of the capacity being partitioned for memory mode. Automatically, the rest is configured as an interleaved region for app direct mode.
If there happens to already be a goal in place, you can see that go with the ipmctl show goal command and delete the goal with ipmctl delete goal. Then you can set a new goal if you wish. Once you restart the system, the goal is put into effect and then cleared until you set a new one. You can check your current mode or confirm a mode change after restart with the show memory resources and show region commands. If the mode is app direct, a single region per socket is created, and the app direct capacity should be almost all of the total capacity seen in the show memory resources output.
For memory mode, there will be no memory regions at all, and the memory capacity will be nearly all of the total capacity shown by show memory resources. If your mode is mixed with some percentage of memory in app direct modes, you will see two regions, but their capacity will not be the full capacity of your modules. In this example, we set memory mode to be 60% of the capacity. From the output of show memory resources, you can see memory capacity has about 60% of the total capacity, and app direct capacity has the other 40%.
So far, we've shown how ipmctl is used to create regions where a region is raw, persistent memory capacity and is not visible to the OS nor the application. Similar to partitioning the raw space on a solid state drive, we now need to create namespaces on top of regions that we can exposed to applications. From this point, we will use ndctl.
When a namespace is created, a persistent memory device is created at /dev/pmemn, where n starts with zero. ndctl supports creating name spaces in four different modes– Filesystem-DAX, Device-DAX, Sector, and Raw. The two most common are Filesystem-DAX and Sector.
File System Direct Access, or FSDAX for short, is the default namespace mode created when calling ndctl create namespace with no options. It creates a block device that supports the DAX capabilities of XFS and EXT4 Linux file systems. DAX removes the page cache from the I/O path and allows the system call to Nmap to establish direct mappings to persistent memory media. The DAX capabilities enables workloads or working sets that would exceed the capacity of page cache to scale up to the capacity of persistent memory.
Workloads that fit in page cache or perform bulk data transfers may not see benefits from DAX, though when in doubt, pick this mode. The figure shown here shows a typical file system DAX configuration with three modules that have been interleaved. All available capacity has been used to create a single region, namespace, and DAX-enabled file system. Namespace naming convention is commonly x.y where x is the region number and y is the namespace.
To create a namespace in FS-DAX mode, we first create an app direct goal using ipmctl as we described earlier. After restart, we create a namespace. The default call to ndctl create namespace creates a namespace in FS-DAX mode. In this mode, direct access to persistent memory is provided by mounting the file system with the DAX option.
Now we can create the file system and mount. Either EXT4 or XFS file systems support direct access. You can see how easy it is to configure regions and namespaces on which we created a DAX-aware file system for our application. Optionally, in the call to create namespace, if we were to use the size flag, we could specify the size of our namespace to create multiple namespaces per region. This allows us to create multiple file systems per region for individual application requirements.
Sector mode, also known as storage over app direct, is used to host applications that are not prepared for torn sectors after a crash or for hosting legacy filesystems that do not checksum metadata. The expected usage for this mode is for small boot volumes. This mode is compatible with operating systems that do not natively support persistent memory.
To create a namespace to use with traditional POSIX standard API, use sector mode. In this example, the default sector size is 4k. Linux supports 512-byte blocks as well for legacy applications and file systems that do not use 4k. But remember, sector mode does not support direct access. To create a namespace, first create a region in app direct mode that is byte addressable. After rebooting, create the file system and mount it.
Like our filesystem DAX example previously, it requires a few simple steps to create a namespace using persistent memory that has atomic block features just like an SSD or an NVME device. Now we can install our non-persistent memory-aware application.
Additionally, the following ndctl commands may come in handy when managing your regions and namespaces. The first one lists all dimms. By default, ndctl only lists enabled or active dimms, regions, and namespaces. Adding the -i flag shows both enabled and disabled devices. Similarly, the second command is used to list all regions.
If you want to create a new goal, you must first disable and destroy any previously created namespaces by using the following two commands. There you have it. With your configuration mode selected and your file system mounted, you're now ready to get started running your applications. You can always find further support and more information on the Intel Developer Zone or by chatting directly with other developers on our Persistent Memory Google group. Make sure to check out the links, and thank you for watching.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.