1.1.3. Throughput for Reads
Number of completion packets = 512/256 = 2
Overhead for a 3 dword TLP Header with no ECRC = 2*20 = 40 bytes
Maximum Throughput % = 512/(512 + 40) = 92%.
These calculations do not take into account any DLLPs and PLPs. The PCI Express Base Specification defines a read completion boundary (RCB) parameter. The RCB parameter determines the naturally aligned address boundaries on which a read request may be serviced with multiple completions. For a root complex, the RCB is either 64 bytes or 128 bytes. For all other PCI Express devices, the RCB is 128 bytes.
Read throughput depends on the round-trip delay between the following two times:
- The time when the application logic issues a read request
- The time when all of the completion data has been returned.
To maximize throughput, the application must issue enough read requests and process enough read completions. Or, the application must issue enough non-posted header credits to cover this delay.
The following figure shows timing diagram for memory read requests (MRd) and completions (CplD). The requester waits for a completion before making a subsequent read request, resulting in lower throughput.
The following timing diagram eliminates the delay for completions with the exception of the first read. This strategy maintains a high throughput.
The requester must maintain maximum throughput for the completion data packets by selecting appropriate settings for completions in the RX buffer. All versions of Intel’s PCIe IP cores offer five settings for the RX Buffer credit allocation performance for requests parameter. This parameter specifies the distribution of flow control header, data, and completion credits in the RX buffer. You should use this parameter to allocate credits to optimize for the anticipated workload.
A final constraint on the throughput is the number of outstanding read requests supported. The outstanding requests are limited by the number of header tags and the maximum read request size. The maximum read request size is controlled by the device control register (bits 14:12) in the PCIe Configuration Space. The Application Layer assign header tags to non-posted requests to identify completions data. The Number of tags supported parameter specifies number of tags available. A minimum number of tags are required to maintain sustained read throughput. This number is system dependent. On a Windows system, eight tags are usually enough to ensure continuous read completion with no gap for a 4 KByte read request. The High Performance Request Timing Diagram uses 4 tags. The first tag is reused for the fifth read.