Visible to Intel only — GUID: ewa1457384630094
Ixiasoft
Visible to Intel only — GUID: ewa1457384630094
Ixiasoft
8.5. Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor
Intel® 's M20K memory blocks have two physical ports. The number of logical ports that are available in each M20K block depends on the degree of pumping. Pumping is a measure of the clock frequency of the M20K blocks relative to the rest of the design.
Consider the following code example where the singlepump attribute is applied to a local memory system, lmem, which has three read accesses and one write access. The singlepump attribute indicates that the M20K blocks runs at the same frequency as the rest of the design.
__kernel void three_copies(int raddr, int waddr) {
int __attribute__((memory,
numbanks(1),
singlepump,
max_replicates(3)))
lmem[16];
lmem[waddr] = lmem[raddr] + lmem[raddr + 1] + lmem[raddr + 2];
// do something with lmem
}
If you include the doublepump kernel attribute in your local variable declaration, you specify that the M20K memory blocks runs at double the frequency as the rest of the design.
__kernel void three_copies(int raddr, int waddr) {
int __attribute__((memory,
numbanks(1),
doublepump))
lmem[16];
lmem[waddr] = lmem[raddr] + lmem[raddr + 1] + lmem[raddr + 2];
// do something with lmem
}
Each double-pumped M20K block has four logical ports available. As such, there only needs to be one M20K block to implement three read ports and one write port in lmem.
- Double pumping the memory increases resource overhead. Use the doublepump kernel attribute only if it results in actual M20K savings, improves performance, or both.
- Stores must be connected to every replicate. Hence, if there are more than three stores, the memory is not replicated. Local memory replication works well with single store.
- Because the entire memory system is replicated, you might observe potentially large M20K memory blocks.