TitleEfficient Computation of Sum-products on GPUs Through Software-Managed Cache (In Proceedings)
inProceedings of the 22nd ACM International Conference on Supercomputing
Author(s) Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens
Keyword(s)Sum-product, GPGPU, CUDA, Software-managed cache
Year June 2008
LocationIsland of Kos, Aegean Sea, Greece
DateJune 7-12, 2008
OrganizationThe 22nd ACM International Conference on Supercomputing

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms.

We apply this technique to the implementation of the GPU-based solver of the sum-product or marginalize a product of functions (MPF) problem, which arises in a wide variety of real-life applications in artificial intelligence, statistics, image processing, and digital communications. Our motivation to accelerate MPF originated in the context of the analysis of genetic diseases, which in some cases requires years to complete on modern CPUs. Computing MPF is similar to computing the chain matrix product of multi-dimensional matrices, but is more difficult due to a complex data-dependent access pattern, high data reuse, and a low compute-to-memory access ratio.

Our GPU-based MPF solver achieves up to 2700-fold speedup on random data and 270-fold on real-life genetic analysis datasets on GeForce 8800GTX GPU from NVIDIA over the optimized CPU version on an Intel 2.4 GHz Core 2 with a 4 MB L2 cache.

Note Research support for M. Silberstein, A. Patney, and J. Owens is gratefully acknowledged: the SciDAC Institute for Ultrascale Visualization and NSF Award 0541448.