Login


TitleAcceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters (Article)
inJournal of Aerospace Computing, Information, and Communication
Author(s) Everett H. Phillips, Yao Zhang, Roger L. Davis, John D. Owens
Editor(s) Mike Hinchey
Year August 2011
Volume8
Number8
PublisherThe American Institute of Aeronautics and Astronautics (AIAA)
Pages237--249
BibTeX
Abstract We demonstrate the first use of graphics processing unit clusters for engineering-level compressible flow computations with two separately developed solvers. Each solver adopts the finite-volume approach on multiblock, structured, nonuniform, quadrilateral meshes. We use a second-order accurate integration scheme, with a second-order accurate dual-time integration, using steady-flow acceleration techniques such as local time-stepping and nonlinear multigrid. The first solver is based on an existing multiblock Fortran code, MBFLO, which is augmented with graphics processing unit acceleration for the laminar and unsteady flow subroutines. The second solver is built from scratch for graphics processing unit acceleration, but is limited to the steady Euler solutions over simple geometries. The Euler solver is optimized by reducing the memory bandwidth, recomputing intermediate values, and combining functions to increase producer-consumer locality. These two approaches are used to highlight trade-offs between development effort and performance when accelerating an application with graphics processing units. On test cases with up to 6.4 million grid points, our graphics processing unit solvers outperform their central processing unit counterparts by up to 20. In our graphics processing unit strong scalability test, 32 graphics processing units scale up to 22 over a single graphics processing unit, which is 16 faster than 32 central processing unit cores, or 500 faster than a single Central Processing Unit core. It is confirmed that graphics processing unit clusters are effective for engineering computational fluid dynamics applications if their order-of-magnitude increase in performance is paired with large problem sizes to amortize the cost of communication.