Data Collection
Before Intel® Cluster Checker can identify issues, it must first gather data from the cluster. Intel® Cluster Checker uses providers to collect data from the system and stores that data in a database. Framework definitions determine what data to collect by defining a set of data providers to run.
Running Data Collection
The clck program triggers data collection followed immediately by analysis. We continue to provide the standalone clck-collect command for those who would rather just collect the data and perform the analysis at a later point in time. Most customers will likely not use this command independently, but the option is available. The clck-collect program only triggers data collection.
Typical invocation of the collect commands is:
clck-collect <options>
By default, Intel® Cluster Checker will collect and analyze data to evaluate the health of the cluster using the health_base framework definition.
It is advised to limit the collection of data with privileged access (root or sudo/admin) as it can cause problems with some of the data providers. Some data collected by cluster checker with privileged access could provide system level details. Databases collected as root or privileged user should be scrutinized if shared and have limited access on the cluster. (By default, databases are located in the users’ home directory in a hidden folder .clck/) It is not recommended to run MPI benchmarks as root user. There may be cases in which running as root is necessary, such as when a provider is attempting to access a tool that is not available to a non-privileged user on a system but limiting running as root as much as possible is recommended. Intel(R) Cluster Checker uses the term admin or priv at the end of framework definition names to identify which require privileged access to run correctly.
Framework Definitions
Framework Definitions, further detailed in the Framework Definitions chapter, can be used to select which providers run when running clck or clck-collect. Framework Definitions can be specified through the command line by using the -F / --framework-definition command line option. For example, to run myFramework.xml, the following command can be used:
clck-collect <options> -F /path/to/myFramework.xml
Custom Framework Definitions can also be specified in the configuration file /opt/intel/clck/20xy/etc/clck.xml or /opt/intel/oneapi/clck/latest/etc/clck.xml. The following example shows how to declare the use of two custom Framework Definitions:
<configuration> <plugins> <framework_definitions> <framework_definition>/path/to/CustomFWD1/xml</framework_definition> <framework_definition>/path/to/CustomFWD2/xml</framework_definition> </framework_definitions> </plugins> ... <configuration>
For more information about Framework Definitions, see the Framework Definitions section in the Reference.
Selecting Nodes
The nodefile contains a list of line-separated cluster node hostnames. For compute nodes, the nodefile is a simple list of nodes. For instance, the nodefile provided by a cluster resource manager typically contains just compute nodes and may be used as-is. Intel® Xeon Phi™ coprocessors should be included in the nodefile as independent nodes. For example, a 5 node cluster;
[user]# cat mynodefile compute-node-01 compute-node-02 compute-node-03 compute-node-04 compute-node-05
The nodefile is specified using the -f command line option.
clck -f ./mynodefile
In some cases, nodes in the nodefile need to be annotated. The # symbol may be used to introduce comments in a nodefile. Annotations are specially formatted comments containing an annotation keyword following by a colon and a value. Annotations may alter the data collection behavior.
When running clck via a Slurm workload manager, you do not need to include a nodefile. Intel® Cluster Checker will query Slurm and create its own nodefile list.
Node Roles
The role annotation keyword is used to assign a node to one or more roles. A role describes the intended functionality of a node. For example, a node might be a compute node. If no role is explicitly assigned, by default a node is assumed to be a compute node. The role annotation may be repeated to assign a node multiple roles.
For example, the following nodefile defines 4 nodes: node1 is a head and compute node; node2, node3, and node4 are compute nodes; and node5 is disabled.
node1 # role: head role: compute node2 # role: compute node3 # implicitly assumed to be a compute node node4 #node5
Some data providers will only run on nodes with certain roles. For example, data providers that measure performance typically only run on compute or enhanced nodes.
Valid node role values are described below.
boot - Provides software imaging / provisioning capabilities.
compute - Is a compute resource (mutually exclusive with enhanced).
enhanced - Provides enhanced compute resources, for example, contains additional memory (mutually exclusive with compute).
external - Provides an external network interface.
head - Alias for the union of boot, external, job_schedule, login, network_address, and storage.
job_schedule - Provides resource manager / job scheduling capabilities.
login - Is an interactive login system.
network_address - Provides network address to the cluster, for example, DHCP.
storage - Provides network storage to the cluster, like NFS.
Subclusters
Some clusters contain groups of nodes, or subclusters, that are homogeneous within the subcluster but differ from the rest of the cluster. For example, one subcluster may be connected with Cornelis™ Omni-Path™ Host Fabric Interface while the rest of the cluster uses Ethernet.
The subcluster annotation keyword is used to assign a node to a subcluster. A node may only belong to a single subcluster. If no subcluster is explicitly assigned, the node is placed into the default subcluster. The subcluster name is an arbitrary string.
For example, the following nodefile defines 2 subclusters, each with 4 compute nodes:
node1 # subcluster: eth node2 # subcluster: eth node3 # subcluster: eth node4 # subcluster: eth node5 # subcluster: ib node6 # subcluster: ib node7 # subcluster: ib node8 # subcluster: ib
By default, cluster data providers will not span across subclusters. To override this behavior, use the following clck-collect command line option:
-S / --ignore-subclusters
This will ignore subclusters when running cluster data providers. That is, cluster data providers will span all defined subclusters. The default is not to span subclusters.
Collect Missing or Old Data
A fully populated database is necessary for a complete analysis. However, the database may be partially populated, in which case it is unnecessary to run a full data collection. To avoid re-collecting valid data by only collecting any data that is missing or old, use the data re-collection feature.
To use this feature, run clck-collect or clck with the -C or --re-collect-data command line option. This option takes no parameters and causes Intel® Cluster Checker to only collect data that is missing or old. This option is useful to avoid running a full data collection when the database is already populated while still ensuring that all data is present and up to date. If data is missing or old for one or more nodes, that data will be re-collected on all specified (or detected) nodes.
Note on deprecation: Intel® Cluster Checker will deprecate the re-collect functionality available in the command line or through the configuration file. Rather than only collecting old or missing data, clck will run the full data collection phase for the associated framework definitions (FWD).
Environment Propagation
Intel® Cluster Checker will automatically propagate the environment that Intel® Cluster Checker is run on with certain collect extensions. Currently supported by:
pdsh
This is done by copying and exporting all environment variables except the following:
HOST
HOSTTYPE
HOSTNAME
MACHTYPE
OSTYPE
PMI_RANK
PMI_SIZE
PMI_FD
MPI_LOCALRANKID
MPI_LOCALNRANKS
DISPLAY
SHLVL
BASH_FUNC
PWD
_=``*``
PROFILEREAD
LC_``*``
A__``*``
Functions
This feature can be turned off:
through the environment by running:
‘export CLCK_TURN_OFF_ENV_PROPAGATION=true’
through turning it off in clck.xml (or whichever configuration file is used)
‘<turn-off-environment-propagation>on</turn-off-environment-propagation>’
or by running with the ‘-e’ flag
If environment propagation is turned off, some providers/Framework Definitions will no longer run as expected (i.e., the hpl_cluster_performance framework definition will not run under root and might cause Cluster Checker to hang). Please use this option with caution.
Configuration File Options
The following variables alter the behavior of data collection as options in the configuration file.
Extensions
Collect extensions determine how Intel® Cluster Checker collects data. To change which collect extension is used, edit the file /opt/intel/clck/<version>/etc/clck.xml or /opt/intel/oneapi/clck/<version>/etc/clck.xml. The syntax for selecting a collect extension is as follows:
<collector> <extension>mpi.so</extension> </collector>
For a single run, the -p or --collect-method flag may be used to specify an extension. Unless the extension is in the default location, the full path must be used.
clck-collect -p pdsh.so
To view collect extensions available by default (located in the installation) the -i or --collect-info flag may be used.
clck -i
Currently, Intel® Cluster Checker uses pdsh by default. The available collect extensions are pdsh (pdsh.so) and Intel® MPI Library or MPICH (mpi.so), both of which are located at /opt/intel/clck/2019x/collect/intel64 or /opt/intel/oneapi/clck/latest/collect/intel64.
Use of other MPI varieties outside of Intel® MPI Library and MPICH are not expected to work.
Note when you chose a specific MPI, this MPI will be used for both launching Cluster Checker and running any possible MPI workloads in the framework definitions requested. The use of MPICH to run framework definitions of Intel MPI Benchmarks (IMB) or HPCG Benchmarks will not work. The IMB benchmarks are found in framework definitions starting with imb_ and are also found in a handful of other framework definitions that run benchmarks such as ‘health_extended_user’, or ‘select_solutions_sim_mod_benchmarks_plus_2018.0’.
In order for Intel® MPI Library or MPICH to be successfully used, the clck.xml file needs to have uncommented the mpi.so extension and the $PATH and $LD_LIBRARY_PATH information for the desired MPI must be correct.
For Intel® MPI Library, ensure the appropriate vars.sh/vars.csh script is sourced;
source /opt/intel/oneapi/setvars.sh
For MPICH (advanced) ensure PATH and LD_LIBRARY_PATH are configured as defined by the MPICH Installers Guide. i.e., for Bash;
export PATH=/path/to/mpich/bin:$PATH export LD_LIBRARY_PATH=/path/to/mpich/libraries:$LD_LIBRARY_PATH
CLCK_COLLECT_DATABASE_CLOSE_DELAY
Specify the amount of time to wait after data collection has finished for data to arrive.
Environmental variable syntax: CLCK_COLLECT_DATABASE_CLOSE_DELAY=value where value is the number of seconds to wait after data collection has finished for any remaining data to be accumulated. The value must be greater than 0. The default value is 1 second.
All data that is in the accumulate queue will always be written to the database, but some data may still be on the wire when data collection has finished. This option provides a method to wait an additional amount of time for data to be received by the accumulate server before exiting. Clusters with very slow networks or a very large number of nodes may need to increase this value from the default.
CLCK_COLLECT_LINGER
Similarly, the CLCK_COLLECT_LINGER environment variable is the number of milliseconds the database socket (via CZMQ*) will wait for messages. The default is infinitely.
CLCK_COLLECTION_TIMEOUT
Specify the amount of time to wait for a collect extension to finish before closing.
Environmental variable syntax: CLCK_COLLECTION_TIMEOUT=value where value is the number of seconds to wait after for the extension to finish. The value must be greater than 0. The default value is 1 week.
Configurable options for Select Solution Framework Definitions
Intel® Cluster Checker includes a number of XML files that help define certain aspects and tasks. For Intel® Select Solution for Simulation and Modeling and Intel® Select Solution for Redhat OpenShift, Intel® Cluster Checker includes Framework Definitions (FWD) used to validated deployments. These files can be edited to change the parameters associated with the different runs.
Note: For these Select Solution framework definitions we strongly recommend not making changes. The variables selected are for validation purposes and chosen to meet Intel Select Solution requirements. Making changes to them will invalidate any effort to verify your select solution unless specifically requested by Intel.
Individual select solution framework definitions will also have a list of defined items they will report on. These are not something we expect a user to change but can be discovered if you review <clck>/etc/fwd/select_solutions_*.xml files, in the postproc section will link to another XML file found in <clck>/etc/postproc/select_solutions*.xml.
In addition to variables that can be changed, the Select Solution Framework Definitions also define what items are reported in files located under <clck_root>/etc/fwd/select_solution*.xml This includes things like HPL Linpack performance, message rate and network bandwidth. Each has its own reporting section that can be altered, but it is strongly recommended against changing this output and we will not cover it in more detail here.
Now lets review the different framework definitions for Select Solution and what they contain.
FWD: select_solutions_sim_mod_user_base_2018.0
The Framework definition for Simulation & Modeling 2018 base configuration contains 6 sections containing different variables for the different workloads this framework will run. The variables in question can be found in the file located: <clck>/etc/providers/select_solutions_sim_mod_base_2018.0.xml
The timeout variable is how long Cluster Checker will wait for the test to complete in seconds.
The memory_usage and iterations relate to how much RAM dgemm will use and how many times it will run
The options flag found in a few tests is passed to the MPIRUN command used to launch the application.
percent_memory and nb are variables used for defining a useful HPL Linpack run.
Stream has use_physical_cores and use_affinity options for running the benchmark.
[clck@node 2021.3.1]$ cat etc/providers/select_solutions_sim_mod_base_2018.0.xml <?xml version="1.0" encoding="UTF-8"?> <collector> <provider> <dgemm> <memory_usage>55</memory_usage> <iterations>45</iterations> <timeout scale="constant">1800</timeout> </dgemm> <hpcg_cluster> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> <timeout scale="linear">1800</timeout> </hpcg_cluster> <hpcg_single> <timeout scale="constant">3600</timeout> </hpcg_single> <hpl_cluster> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> <percent_memory>81</percent_memory> <nb>384</nb> <timeout scale="linear">14400</timeout> </hpl_cluster> <imb_pingpong> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> </imb_pingpong> <stream> <use_physical_cores>yes</use_physical_cores> <use_affinity>yes</use_affinity> </stream> </provider> </collector>
FWD: select_solutions_sim_mod_users_plus_2018.0
The Framework definition for Simulation & Modeling 2018 plus sized configuration contains 6 sections containing different variables for the different workloads this framework will run. The variables in question can be found in the file located: <clck>/etc/providers/select_solutions_sim_mod_plus_2018.0.xml
The timeout variable is how long Cluster Checker will wait for the test to complete in seconds.
The memory_usage and iterations relate to how much RAM dgemm will use and how many times it will run
The options flag found in a few tests is passed to the MPIRUN command used to launch the application.
percent_memory and nb are variables used for defining a useful HPL Linpack run.
Stream has use_physical_cores and use_affinity options for running the benchmark.
[clck@node 2021.3.1]$ cat etc/providers/select_solutions_sim_mod_plus_2018.0.xml <?xml version="1.0" encoding="UTF-8"?> <collector> <provider> <dgemm> <memory_usage>55</memory_usage> <iterations>45</iterations> <timeout scale="constant">1800</timeout> </dgemm> <hpcg_cluster> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> <timeout scale="linear">1800</timeout> </hpcg_cluster> <hpcg_single> <timeout scale="constant">3600</timeout> </hpcg_single> <hpl_cluster> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> <percent_memory>81</percent_memory> <nb>384</nb> <timeout scale="linear">14400</timeout> </hpl_cluster> <imb_pingpong> <options>-genv PSM2_MQ_RNDV_HFI_WINDOW=4194304</options> </imb_pingpong> <stream> <use_physical_cores>yes</use_physical_cores> <use_affinity>yes</use_affinity> </stream> </provider> </collector>
FWD: select_solutions_sim_mod_user_plus_2021.0
The Framework definition for Simulation & Modeling 2021 plus sized configuration contains 4 sections containing different variables for the different workloads this framework will run. The variables in question can be found in the file located: <clck>/etc/providers/select_solutions_sim_mod_plus_2021.0.xml
The timeout variable is how long Cluster Checker will wait for the test to complete in seconds.
percent_memory and nb are variables used for defining a useful HPL Linpack run.
Stream has use_physical_cores and use_affinity options for running the benchmark.
[clck@node 2021.3.1]$ cat etc/providers/select_solutions_sim_mod_plus_2021.0.xml <?xml version="1.0" encoding="UTF-8"?> <collector> <provider> <hpcg_cluster> <timeout scale="linear">1800</timeout> </hpcg_cluster> <hpcg_single> <timeout scale="constant">3600</timeout> </hpcg_single> <hpl_cluster> <percent_memory>81</percent_memory> <nb>384</nb> <timeout scale="linear">21600</timeout> </hpl_cluster> <stream> <use_physical_cores>yes</use_physical_cores> <use_affinity>yes</use_affinity> </stream> </provider> </collector>
FWD: select_solutions_sim_mod_priv_base_2018.0, select_solutions_sim_mod_priv_plus_2018.0, select_solutions_sim_mod_priv_plus_2021.0, select_solutions_sim_mod_priv_plus_second_gen_xeon_sp
These four Framework definitions for Simulation & Modeling are defined for privileged users only. They all have the same variables present but have a different definition for each version of Select Solution. Each contains 2 sections with different variables for the different validation efforts for this framework. The variables in question can be found in the file located: <clck>/etc/kb/select_solutions_sim_mod_priv_base_expected_data_2018.0.xml, etc/kb/select_solutions_sim_mod_priv_plus_expected_data_2018.0.xml, etc/kb/select_solutions_sim_mod_priv_plus_expected_data_2021.0.xml, and etc/kb/select_solutions_sim_mod_priv_plus_second_gen_xeon_sp_expected_data.xml
The model-name variable is identifying the minimum CPU type expected for this solution.
physical-total defines a minimum amount of memory per server.
physical-per-core defines a minimum amount of memory per core.
[clck@cmarvel-2 2021.3.1]$ cat etc/kb/select_solutions_sim_mod_priv_base_expected_data_2018.0.xml <?xml version="1.0" encoding="UTF-8"?> <report> <CPU> <model-name role="compute">Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz</model-name> <model-name role="enhanced">Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz</model-name> </CPU> <MEMORY> <physical-total role="compute">96</physical-total> <physical-total role="enhanced">96</physical-total> <physical-per-core role="compute">2</physical-per-core> <physical-per-core role="enhanced">2</physical-per-core> </MEMORY> </report>
FWD: select_solutions_redhat_openshift_base and select_solutions_redhat_openshift_plus
These two frameworks are used to validate the Intel® Select Solution for RedHat OpenShift, specifically the minimum configuration options for the solution. The files for these items can be found: etc/kb/select_solutions_redhat_openshift_expected_data_base.xml and etc/kb/select_solutions_redhat_openshift_expected_data_plus.xml
The variables defined here are related to different ‘roles’ the played by different systems in the solution. Each with their own associated requirements
model-name covers the minimum cpu requirement for the solution
physical-total is the memory requirement total in bytes
node-count-x where x is the node type, and the count is how many of those servers are present
[clck@cmarvel-2 2021.3.1]$ cat etc/kb/select_solutions_redhat_openshift_expected_data_base.xml <?xml version="1.0" encoding="UTF-8"?> <report> <CPU> <model-name role="storage">Intel(R) Xeon(R) Silver 5115 CPU @ 2.40GHz</model-name> <model-name role="application">Intel(R) Xeon(R) Silver 4414 CPU @ 2.20GHz</model-name> <model-name role="control">Intel(R) Xeon(R) Silver 4414 CPU @ 2.20GHz</model-name> <sockets>1</sockets> </CPU> <MEMORY> <!--Values are in Byte--> <physical-total role="storage">192</physical-total><!--192GB--> <physical-total role="application">384</physical-total><!--384GB--> <physical-total role="control">192</physical-total><!--192GB--> </MEMORY> <ROLES-storage> <node-count-storage>3</node-count-storage> </ROLES-storage> <ROLES-control> <node-count-control>6</node-count-control> </ROLES-control> <ROLES-application> <node-count-application>4</node-count-application> </ROLES-application> </report>
Custom libfabric Provider
Intel® Cluster Checker attempts to support many different interconnects. Some of the framework definitions use Intel® MPI Library to run MPI benchmarks; for example, the Select Solutions framework select_solutions_sim_mod_benchmarks_base_2018.0 or IMB frameworks such as imb_pingpong_fabric_performance. In some scenarios it may be desirable to use a different Libfabric OFI provider when running your MPI application, including those run through Cluster Checker.
To override what Cluster Checker selects you can set the environment variable I_MPI_OFI_PROVIDER to a specific libfabric provider. A list of what your server supports can be discovered by running the command fi_info. We suggest you set I_MPI_OFI_PROVIDER in your .bashrc file or job submission script. In the example below, the value ‘sockets’ is replaced by a libfabric provider listed in the output of fi_info command. So, on an Amazon Web Services cluster supporting EFA (Elastic Fabric Adapter), I_MPI_OFI_PROVIDER=efa would be used.
export I_MPI_OFI_PROVIDER=sockets
By default, Intel® Cluster Checker and Intel® MPI Library will choose an optimized fabric provider, but there are scenarios where it is worthwhile to override those defaults for testing.
Note: Intel® TrueScale InfiniBand - TrueScale IB only has Intel® MPI 2018 support, newer version of Intel® MPI Library do no support TrueScale. TrueScale also has limited newer Operating System support. When collecting data with TrueScale IB, be sure to have the environment variables I_MPI_FABRICS=tmi and I_MPI_TMI_PROVIDER=psm present either in slurm script or exported.