|Title||An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
(In Proceedings) |
|in||Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium|
Andrew Davidson, Yao Zhang, John D. Owens |
|Keyword(s)||auto-tuning, tridiagonal solvers, gpu computing|
We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of onchip shared memory size. We tackle this problem by splitting the systems to smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate autotuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively for static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 5-11x.