Abstract 
We demonstrate the first use of graphics processing unit clusters for engineeringlevel compressible flow computations with two separately developed solvers. Each solver adopts the finitevolume approach on multiblock, structured, nonuniform, quadrilateral meshes. We use a secondorder accurate integration scheme, with a secondorder accurate dualtime integration, using steadyflow acceleration techniques such as local timestepping and nonlinear multigrid. The first solver is based on an existing multiblock Fortran code, MBFLO, which is augmented with graphics processing unit acceleration for the laminar and unsteady flow subroutines. The second solver is built from scratch for graphics processing unit acceleration, but is limited to the steady Euler solutions over simple geometries. The Euler solver is optimized by reducing the memory bandwidth, recomputing intermediate values, and combining functions to increase producerconsumer locality. These two approaches are used to highlight tradeoffs between development effort and performance when accelerating an application with graphics processing units. On test cases with up to 6.4 million grid points, our graphics processing unit solvers outperform their central processing unit counterparts by up to 20×. In our graphics processing unit strong scalability test, 32 graphics processing units scale up to 22× over a single graphics processing unit, which is 16× faster than 32 central processing unit cores, or 500× faster than a single Central Processing Unit core. It is confirmed that graphics processing unit clusters are effective for engineering computational fluid dynamics applications if their orderofmagnitude increase in performance is paired with large problem sizes to amortize the cost of communication.
