|Title||Efficient Parallel Merge Sort for Fixed and
Variable Length Keys
(In Proceedings) |
|in||Innovative Parallel Computing|
Andrew Davidson, David Tarjan, Michael Garland, John D. Owens |
|Keyword(s)||Comparison Sort, GPU Computing, SIMD, Strings|
|Location||San Jose, CA|
We design a high-performance parallel merge sort for highly parallel systems. Our merge sort is designed to use more register communication (not shared memory), and does not suffer from over-segmentation as opposed to previous comparison based sorts. Using these techniques we are able to achieve a sorting rate of 250 MKeys/sec, which is about 2.5 times faster than Thrust merge sort performance, and 70% faster than non-stable state-of-the-art GPU merge sorts.
Building on this sorting algorithm, we develop a scheme for sorting
variable-length key/value pairs, with a special emphasis on string keys.
Sorting non-uniform, unaligned data such as strings is a fundamental
step in a variety of algorithms, yet it has received comparatively
little attention. To our knowledge, our system is the first published
description of an efficient string sort for GPUs. We are able to sort
strings at a rate of 70 MStrings/s on one dataset and up to 1.25 GB/s on
another dataset using a GTX 580.