Step 4: Set Up User Permissions for Using the Device files for Intel GPUs
You need to modify some of the system parameters on nodes that use Intel GPUs for computing tasks:
Allow Long-running GPU Kernels
By default, the GPU driver assumes that tasks run on the GPU for only a few seconds. Since many cluster applications have offload kernels that use the GPU for more than a few minutes, to be able to use the GPU as part of their execution, you need to change a few kernel parameters:
Disable i915.enable Hangcheck
Disable hangcheck to enbale long-running GPU tasks as follows:
To temporarily disable hangcheck until the next reboot, execute the following command:
sudo sh -c "echo N> /sys/module/i915/parameters/enable_hangcheck"
To permanently disable hangcheck across multiple reboots, add the following text to the Linux kernel boot parameters in GRUB_CMDLINE_LINUX_DEFAULT:
i915.enable_hangcheck=0
For more details, see the GPU: Disable Hangcheck section of the Intel oneAPI Installation Guide.
Lengthen the Timeout Interval i915.request_timeout_ms
Linux kernels 5.12 or later will reset the GPUs if the compute kernel runs longer than a few seconds. To avoid a device reset during long running kernels, increase the value of the i915.request_timeout_ms parameter sd follows.
To temporarily increase the timeout interval until the next reboot, execute the following command:
sudo sh -c "echo 200000 > /sys/module/i915/parameters/request_timeout_ms"
To permanently increase the timeout interval, add the following text to the Linux kernel boot parameters in GRUB_CMDLINE_LINUX_DEFAULT:
i915.request_timeout_ms=200000
Disable Preemption Timeout
For the compute device in use, set the preemption timeout value to zero by creating the following udev rule called /etc/udev/rules.d/99-i915-disable-preempt.rules1 with the following contents:
ACTION=="add|bind",SUBSYSTEM=="pci",DRIVER=="i915",RUN+="/bin/bash -c 'for i in /sys/$DEVPATH/drm/card?/engine/[rc]cs*/preempt_timeout_ms; do echo 0 > $i; done'"
After the udev rule has been set up, the preemption timeout will be set to zero after the next reboot. The rule can be manually triggered without a reboot by using the udevadm command as follows:
udevadm trigger -s pci --action=add
Use this command to verify that the preemption timeout is set correctly:
find /sys/devices -regex '.*/drm/card[0-9]*/engine/[rc]cs[0-9]*/preempt_timeout_ms' -exec echo {} \; -exec cat {} \;
Modify Heartbeat Interval
Just like the preemption timeout, we also need to zero the heartbeat interval by creating the following udev rule called /etc/udev/rules.d/99-i915-disable-heartbeat.rules with the following contents:
ACTION=="add|bind",SUBSYSTEM=="pci",DRIVER=="i915",RUN+="/bin/bash -c 'for i in /sys/$DEVPATH/drm/card?/engine/[rc]cs*/heartbeat_interval_ms; do echo 0 > $i; done'"
After the udev rule has been set up, the preemption timeout will be set to zero after the next reboot. The rule can be manually triggered without a reboot by using the udevadm command as follows:
udevadm trigger -s pci --action=add
Use this command to verify that the heartbeat interval parameter is set correctly:
find /sys/devices -regex '.*/drm/card[0-9]*/engine/[rc]cs[0-9]*/ heartbeat_interval_ms' -exec echo {} \; -exec cat {} \;
Set Up for Maximum MPI/XeLink Performance on Ubuntu*
If your nodes are running Ubuntu* OS, you need to set the value of the kernel.yama.ptrace_scopesysctl variable to 0 to be able to collect data with VTune and to obtain best performance with Intel MPI and XeLink. See the Intel® VTune™ Profiler User Guide for instructions on how to do this.
Set Permissions on Graphics Devices
To be able to use an Intel GPU, users must have the permission to access the following device files:
/dev/dri/card* are used for direct rendering devices and provide full privileged access to the GPU software
/dev/dri/renderD* provide non-privileged access to the GPU hardware, which is typically sufficient for compute tasks performed by non-privileged users
By default, access to the device files is controlled by one of the following groups:
“render” (on Ubuntu* 19 and higher, CentOS* 8, and Fedora* 31) local group, which was introduced on RHEL* 8.x and Ubuntu* 19.x for users requiring less-privileged use of the GPU for things like computation
“video” (on Ubuntu* 18, Fedora* 30, and SLES* 15 SP1) local group, which gives much more privileged access to the GPU
You have three options to enable non-privileged users to run compute tasks on nodes with Intel GPUs:
Assign each user who might want to compute on Intel GPUs to the local “render” or “video” group (depending on the OS version) on every node with an Intel GPU. This may be impractical if you have a large number of GPU nodes, a volatile userbase, or use a system image for your cluster nodes that is not updated often (updates are the only time you could add additional users to the local “render” or “video” groups).
Assign user permissions based on allocation type and job queue.
Control access to the GPUs on a node using a udev rule. To achieve this, create a udev rule /etc/udev/rules.d/99-i915-change-group.rules with the following contents:
SUBSYSTEM=="drm", KERNEL=="card*", GROUP="<group_ID>" SUBSYSTEM=="drm", KERNEL=="renderD*", GROUP="<group_ID>"
Change the <group_ID> placeholder to the group name or ID the device permissions should be changed to. The <group_ID> may need to be a numerical value rather than the alphanumerical value you are used to seeing if it is served by a remote authentication service. You can get this numerical value when the system is fully restarted and open for use using the command getent group <group>.
Example: getent group users
After the rule has been set up, the GPU device will be available for the selected group of users after the next reboot. The rule can be manually triggered with the udevadm command as follows:
udevadm trigger -s drm --action=add
Set Up for GPU Profiling Using Intel® VTune™ Profiler
To use Intel® VTune™ Profiler on nodes with Intel GPUs, the dev.i915.perf_stream_paranoid option must be set up as documented in Set Up System for GPU Analysis and the VTune sampling driver and the FTrace file system have to be assigned to the same group.
Despite the warnings seen by most VTune users, it is not necessary to rebuild the i915 driver with CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y as suggested by the above document. Adequate information about compute jobs can be obtained without it.
Set Up for GPU Debugging Using the Intel® Distribution for GDB*
To set up and use the Intel® Distribution for GDB* to debug code running on an Intel GPU, refer to Get Started with Intel® Distribution for GDB* on Linux* OS Host. See also GPU Debugging.
Summary
In summary, the above changes can be listed as follows:
GRUB_CMDLINE_LINUX_DEFAULT additions:
Prevent PCI errors from being routed to the BIOS and causing a system-level exception on the CPU as follows (instead they route to the driver): “pcie_ports=native”
Use the PCI performance configuration: "pci=pcie_bus_perf"
Enable GPU debugging: interface (possible security issues): "drm.debug=0xa i915.debug_eu=1"
Disable GPU hang checks: "i915.enable_hangcheck=0"
On Linux kernels 5.12 or higher: i915.request_timeout_ms=200000
UDEV Rules
Disable preemption timeout: set preempt_timeout_ms=0 on all engines
Set ownership of card to network group rather than local groups “video” and “render” for cards on shared systems
SUBSYSTEM=="drm", KERNEL=="card*", GROUP="<group_ID>"
SUBSYSTEM=="drm", KERNEL=="renderD*", GROUP="<group_ID>“
Allow VTune Data collection on GPU (possible security issues): dev.i915.perf_stream_paranoid=0
For VTune and good MPI performance on XeLink: kernel.yama.ptrace_scope=0
Allow VTune driver-less performance event collection (possible security issues): kernel.perf_event_paranoid=0