Authors: Roman Dementiev and Angela D. Schmid
Dear Software Tuning, Performance Optimization & Platform Monitoring community,
The recent and upcoming Intel® Core™ processors of 2nd,3rd,4th ,5th and 6th generation (previously codenamed Sandy-Bridge, Ivy-Bridge, Haswell, Broadwell and Skylake) expose model specific counters that allow for monitoring requests to DRAM.
The counters employ circuitry residing in the memory controller, and monitor transaction requests coming from various sources, e.g. the processor cores, the graphic engine, or other I/O agents. The monitoring interface uses memory-mapped I/O reads from physical memory at the offsets specified in Table 1. Memory traffic metrics can be derived as follows:
- Data read from DRAM in number of bytes: UNC_IMC_DRAM_DATA_READS*64
- Data written to DRAM in number of bytes: UNC_IMC_DRAM_DATA_WRITES*64
Users and developers may take advantage of Intel tools to easily access the counters or derived memory performance metrics:
- Intel® VTune™ Amplifier XE 2013 Update 5
- Intel® Performance Counter Monitor Version 2.4 (tool and sample implementation with source code)
Table 1. Addresses of DRAM Counters.
The DRAM counters below are model specific meaning they will change or not be supported in the future. The BAR is available (in PCI configuration space) at Bus 0; Device 0; Function 0; Offset 048H.
UNC_IMC_DRAM_GT_REQUESTS | BAR + 0x5040 | Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from the GT engine. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate GT memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined. |
UNC_IMC_DRAM_IA_REQUESTS | BAR + 0x5044 | Counts every read/write request (demand and HW prefetch) entering the Memory Controller to DRAM (sum of all channels) from IA. Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IA memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined. |
UNC_IMC_DRAM_IO_REQUESTS | BAR + 0x5048 | Counts every read/write request entering the Memory Controller to DRAM (sum of all channels) from all IO sources (e.g. PCIe, Display Engine, USB audio, etc.). Each partial write request counts as a request incrementing this counter. However same-cache-line partial write requests are combined to a single 64-byte data transfers from DRAM. Therefore multiplying the number of requests by 64-bytes will lead to inaccurate IO memory bandwidth. The inaccuracy is proportional to the number of same-cache-line partial writes combined. |
UNC_IMC_DRAM_DATA_READS | BAR + 0x5050 | Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations. |
UNC_IMC_DRAM_DATA_WRITES | BAR + 0x5054 | Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64-byte data transfers from DRAM. Use for accurate memory bandwidth calculations. |
Regards,
Roman Dementiev
Staff Application Engineer
Intel Corporation
Angela D. Schmid
Performance Engineer
Intel Corporation