Hard Processor System Technical Reference Manual: Agilex™ 5 SoCs

ID 814346
Date 4/01/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.1.6.2.5. DMA TX Data Transfer Operation

The TX DMA performs the data transfer operations as follows:
  1. The descriptor fetch engine reads a valid descriptor (OWN=1) either from the system memory or the descriptor pre-fetch cache and gives it to the data transfer engine.
  2. The data transfer engine processes the descriptor control bits, buffer sizes and calculates the size of data transfer to be requested as per the settings of the registers (TxPBL field). It checks if the space is available in the corresponding TX queue before it issues the request for data transfer
  3. After the request is accepted by the host interface (for scheduling on the bus), this engine calculates the next data transfer size and issues another request. The second request can be for one of the following:
    1. Next burst of data transfer from the same buffer in case the first request is only for part of the buffer.
    2. Burst of data transfer from buffer 2 in case the first request completed the transfer of buffer 1.
    3. Burst of data transfer from buffer 1 of next descriptor in case the first request completed for buffer 2 (or buffer 2 was empty).
    4. Start of new packet transfer from Buffer of next descriptor in case the first request completes a packet transfer.
  4. At any given time, up to two requests can be outstanding for data transfer completion.
  5. A request is considered to be completed once the requested data is fetched and written to the corresponding TX queue in MTL. After this, the engine can issue the next request as given in step 3.
  6. After the valid buffers of a descriptor is fetched, the descriptor is freed-up and pushed to the descriptor write-back engine for closure and the next descriptor from the pre-fetched cache is accepted for processing (step 2).

All the above operations are implemented in a pipe-lined fashion so as to avoid any bottlenecks and overheads due to descriptor reads or writes. Moreover, the engine can request two packet transfers in sequence (for packets of size smaller than the programmed PBL) without completing the fetch of the first one. This enables hiding of the system latencies of a read command behind the data transfer of the previous command and thus improves the throughput.