3.18. 855830: Loads of Mismatched Size May not be Single-Copy Atomic
Description
The Cortex*-A53 MPCore* processor supports single-copy atomic load and store accesses as described in the Arm* architecture documentation. However, in some unusual code sequences, this erratum can cause the CPU executing a store, and later a load, to the same address but with a different access size to load data that does not meet the requirements of a single-copy atomic load.
- A store instruction executes. This store must be a smaller access size than the store on the first CPU, and must address bytes accessed by the first CPU. The address must also be aligned to the access size.
- The store instruction does not allocate into the cache because of any one of the following conditions:
- The memory address is marked as transient
- The write allocate hint in the translation table is not set or the memory is marked as non-cacheable
- The CPU recently executed a stream of stores and has subsequently dynamically switched into a no-write allocate mode
- A load instruction executes. The load must be a larger access size than the store from the same CPU, and at least some bytes of the load must be to the same address as the store. The address must also be aligned to the access size.
The Arm* architecture requires that the load is single-copy atomic. However, in the conditions described, the load may observe a combination of the two stores, indicating that the store on the initial CPU was serialized first. If the load is repeated, it might only see the data from the first CPU's store, indicating that the store on the first CPU was serialized second.
Impact
Concurrent, unordered stores are uncommon in multi-threaded code. In the ISO/IEC 9899:2011 (C11) standard, they are restricted to the family of relaxed atomics. Using different size load and store instructions to access the same data is also uncommon. For these reasons, the majority of multi-threaded software does not meet the conditions for this erratum.
Workaround
Most multi-threaded software does not satisfy the conditions of this erratum and therefore, does not require a workaround . If a workaround is required, then replace the store on the second CPU with a store-release instruction.