TitleThree-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors (In Proceedings)
inProceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding
inAutomatic Speech Recognition and Understanding Workshop
Author(s) Kshitij Gupta, John D. Owens
Year December 2009
LocationMerano, Italy
DateDecember 13-17, 2009
Abstract In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation reduction techniques. We find considerable locality at the frame, CI-GMM, and mixture layers of GMM compute, and show how it can be extracted by following a chunk-based technique of processing multiple frames for every load of a GMM. On a 1,000-word, command-and-control, continuous-speech task, we are able to achieve compute and memory bandwidth savings of over 60% and 90% respectively, with some degradation in accuracy, when compared to existing GPU-based fast GMM computation techniques.