This problem is due to a datapath race condition. The DMA read mover "Done" status update and the completion data are split internally into two (2) different paths/buffers. Data takes a longer path to the Avalon® -MM slave compared to the status update.
This datapath race condition is easily observed in simulation. However, the read mover "Done" status reported a few clock cycles earlier than the data transfer completion will not be a problem in real hardware systems due to latency.