Software Occlusion Culling Code Sample

This video provides a quick look at an Occlusion Culling code sample. Occlusion culling is commonly used to improve performance in games. While this can be done on the GPU, this sample provides an alternative to run the culling on the CPU.

Hello. I'm John from Intel. As a game developer, you understand the issues with occlusion in your games. Objects that can't be seen are still sent to the GPU for rendering, but to no visual effect. If we can lower the computational costs of occluded objects, we can improve the application's performance, but with no impact to the scene quality. 

Occlusion culling is commonly used to improve performance in games. While this can be done on the GPU, this sample provides an alternative to run the culling on the CPU. The Z-buffer is the last stage in the rendering pipeline where object fragments can be identified as occluded. 

Culling geometries on the CPU can bring additional benefits over culling them on the GPU. The GPU is often a bottleneck in games, and running this on the CPU can alleviate those GPU bottlenecks, as well as save the cost of submitting occlusion queries to the GPU. This submission cost includes the direct 3D API calls, and the memory transfer between the CPU and GPU of the occluders. 

This sample code, available at the GitHub link below, uses frustum culling and is optimized with streaming SIMD extensions, or SSE, and is multi-threaded to get up to an 8x performance speed up compared to a non-culled display of the same scene. In this approach, we use the CPU to rasterize the depth buffer. Then we use axis-aligned bounding box tests to determine if an object is occluded. Occluded objects are removed from the rendering pipeline to reduce overhead. 

We do not cull visible and partially visible objects. We submit them to the GPU for rendering. Additionally, in this sample, the software rasterizer is vectorized using SSE, and is multi-threaded to improve performance. 

Software occlusion culling has two steps, depth buffer rasterization depth test culling. This is a screenshot of the software occlusion culling sample. The castle walls and the ground plane are a occluders in the scene. It would have been better to hand-pick just the large castle walls as occluders, but we wanted to make sure that the algorithm worked with a realistic scene without expensive content changes. Once the depth buffer has been rhapsodized on the CPU, all the occludes in the scene are depth tested to determine which occludes can be culled. 

For depth test culling, we use object space axis-aligned bounding boxes, or AABBs, to test the occludes against the CPU generated depth buffer. The algorithm treats all objects in the scene as occludes. AABB makes these tests more conservative. If the AABB is occluded, then the object contained inside it is also occluded and can be culled. 

This is just an introduction to what happens. The article in the link has much more information, as well as specific optimizations, performance evaluations, and steps on running the sample code. Thank you for watching, and please let us know how well the code works for you.