Abstract
This document is a Runtime Optimization Blueprint illustrating how the performance of runtimes can be improved by using large code pages. The intended audience is runtime implementers, customers, and providers deploying runtimes at scale. In the Overview section, we introduce the problem that runtimes have with high Instruction Translation Lookaside Buffer (ITLB) miss stalls (on average 7% of the CPU cycles are stalled across seven commonly used runtimes). In the Diagnosis section, we illustrate how to diagnose this problem using the Performance Monitoring Unit (PMU) on Intel® architecture processors, counters, and sample tools. In the Solution section, we provide an Intel reference implementation as well as other approaches to solve this problem. The Solution Integration section describes how to integrate the reference implementation in runtimes. The Case Studies section details how this optimization improves performance and reduces ITLB misses (up to 50%) in three applications in three environments. The last section summarizes the blueprint and provides a call to action for runtime developers/implementers.
Download Runtime Performance Optimization Blueprint: Large Code Pages (PDF 662K)