|Title||Efficient Computation of Sum-products on GPUs Through Software-Managed Cache (In Proceedings)|
|in||Proceedings of the 22nd ACM International Conference on Supercomputing|
|Author(s)||Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens|
|Keyword(s)||Sum-product, GPGPU, CUDA, Software-managed cache|
|Location||Island of Kos, Aegean Sea, Greece|
|Date||June 7-12, 2008|
|Organization||The 22nd ACM International Conference on Supercomputing|
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms.
We apply this technique to the implementation of the GPU-based solver of the sum-product or marginalize a product of functions (MPF) problem, which arises in a wide variety of real-life applications in artificial intelligence, statistics, image processing, and digital communications. Our motivation to accelerate MPF originated in the context of the analysis of genetic diseases, which in some cases requires years to complete on modern CPUs. Computing MPF is similar to computing the chain matrix product of multi-dimensional matrices, but is more difficult due to a complex data-dependent access pattern, high data reuse, and a low compute-to-memory access ratio.
Our GPU-based MPF solver achieves up to 2700-fold speedup on random data and 270-fold on real-life genetic analysis datasets on GeForce 8800GTX GPU from NVIDIA over the optimized CPU version on an Intel 2.4 GHz Core 2 with a 4 MB L2 cache.
|Note||Research support for M. Silberstein, A. Patney, and J. Owens is gratefully acknowledged: the SciDAC Institute for Ultrascale Visualization and NSF Award 0541448.|