The source of all the chaos: Image Enhancement by Unsharp Masking the Depth Buffer
Since then, a lot of work has been done on refining SSAO. A lot of people think SSAO is not worth the cost, but it all depends on the method and your scene. For visualization purposes, I would argue that Luft’s method (Unsharp Masking the Depth buffer) produces useful distinction between foreground and background objects, resembles physical lighting so it does not require significant training, with a runtime based on number of pixels rather than scene complexity and therefore is worth the cost.
IEEE Vis 2009 had an interesting paper called “Depth-Dependent Halos: Illustrative Rendering of Dense Line Data”. They create halos around line data that vary in thickness based on depth. The paper is very well put together, with some cool 3d youtube videos - time to bust out your red/cyan glasses :) Unfortunately, I think this technique is rather poor compared to using SSAO, particularly Unsharp Masking of the Depth Buffer. SSAO is a simpler solution that applies to any geometry including billboarded objects and produces visually similar results with a better runtime that is dependent on pixels rather than geometry. I’m not sure why someone would prefer the technique described in Depth-Dependent Halos. Unsharp masking the depth buffer essentially creates depth dependent halos anyways!
Anyway, there is one great megaguide to SSAO techniques chronicled by Nick Porcino: http://nickporcino.com/meshula-net-archive/posts/post145.html
In addition, there is one great butchering of SSAO techniques showing where they suck and how you can overcome the problems: http://directtovideo.wordpress.com/2010/01/15/ambient-occlusion-in-frameranger/
There were some interesting developments on SSAO at SIGGRAPH ‘10. One shows how you can use a fairly routine kernel marching technique to improve SSAO performance by keeping overlapping sections of the kernel in fast shared memory. This technique is used to increase arithmetic operation / memory access ratio for computational codes with this optimization also reducing bandwidth. This is performed using CUDA/OpenCL/DirectCompute (here). I’m a bit perplexed there is a performance difference because the texture cache does prefetches based on neighborhood for CUDA, which I believe is also the case with the graphics pipeline. The other development was not a screen space technique, but seems to have a cool new use for the geometry shader by creating ambient occlusion volumes (here). It is a drop-in, shader-only, implementation that provides ambient occlusion but seems to need a bit of research still because of massive amounts of overdraw. I haven’t had time to look yet, but if this method only creates AOVs for visible elements, that’s a step towards the right direction, but still scary if you are working with detailed scenes with lots of geometry (particle systems for instance). Perhaps a hybrid approach would work well: AOV for detailed AO, and then SSAO for the rest of the scene.