Throughput for Reads
PCI Express uses a split transaction model for reads. The read transaction includes the following steps:
- The requester sends a Memory Read Request.
- The completer sends out the ACK DLLP to acknowledge the Memory Read Request.
- The completer returns a Completion with Data. The completer can split the Completion into multiple completion packets.
Read throughput is typically lower than write throughput because reads require two transactions instead of a single write for the same amount of data. The read throughput depends on the delay between the time when the Application Layer issues a Memory Read Request and the time the completer takes to return data. To maximize the throughput, the application must issue enough outstanding read requests to cover this delay.
The figures below show the timing for Memory Read Requests (MRd) and Completions with Data (CplD). The first figure shows the requester waiting for the completion before issuing the subsequent requests. It results in lower throughput. The second figure shows the requester making multiple outstanding read requests to eliminate the delay after the first data returns. It has higher throughput.
To maintain maximum throughput for the completion data packets, the requester must optimize the following settings:
- The number of completions in the RX buffer
- The rate at which the Application Layer issues read requests and processes the completion data
Read Request Size
Another factor that affects throughput is the read request size. If a requester requires 4 KB data, the requester can issue four, 1 KB read requests or a single 4 KB read request. The 4 KB request results in higher throughput than the four, 1 KB reads. The read request size is limited by the Maximum Read Request Size value in Device Control register, bits [14:12].
Outstanding Read Requests
A final factor that can affect the throughput is the number of outstanding read requests. If the requester sends multiple read requests, the number of outstanding read requests is limited by the number of header tags available. The maximum number of header tags is dependent on the RX Buffer credit allocation - performance for received requests parameter in the Hard IP for PCI Express IP core Parameter Editor.