Compiling High Performance Recursive Filters
dc.contributor.author | Chaurasia, Gaurav | en_US |
dc.contributor.author | Ragan-Kelley, Jonathan | en_US |
dc.contributor.author | Paris, Sylvain | en_US |
dc.contributor.author | Drettakis, George | en_US |
dc.contributor.author | Durand, Frédo | en_US |
dc.contributor.editor | Petrik Clarberg and Elmar Eisemann | en_US |
dc.date.accessioned | 2016-01-19T10:32:56Z | |
dc.date.available | 2016-01-19T10:32:56Z | |
dc.date.issued | 2015 | en_US |
dc.description.abstract | Infinite impulse response (IIR) or recursive filters, are essential for image processing because they turn expensive large-footprint convolutions into operations that have a constant cost per pixel regardless of kernel size. However, their recursive nature constrains the order in which pixels can be computed, severely limiting both parallelism within a filter and memory locality across multiple filters. Prior research has developed algorithms that can compute IIR filters with image tiles. Using a divide-and-recombine strategy inspired by parallel prefix sum, they expose greater parallelism and exploit producer-consumer locality in pipelines of IIR filters over multidimensional images. While the principles are simple, it is hard, given a recursive filter, to derive a corresponding tile-parallel algorithm, and even harder to implement and debug it. We show that parallel and locality-aware implementations of IIR filter pipelines can be obtained through program transformations, which we mechanize through a domain-specific compiler. We show that the composition of a small set of transformations suffices to cover the space of possible strategies. We also demonstrate that the tiled implementations can be automatically scheduled in hardwarespecific manners using a small set of generic heuristics. The programmer specifies the basic recursive filters, and the choice of transformation requires only a few lines of code. Our compiler then generates high-performance implementations that are an order of magnitude faster than standard GPU implementations, and outperform hand tuned tiled implementations of specialized algorithms which require orders of magnitude more programming effort-a few lines of code instead of a few thousand lines per pipeline. | en_US |
dc.description.sectionheaders | High-Performance Data Processing | en_US |
dc.description.seriesinformation | High-Performance Graphics | en_US |
dc.identifier.doi | 10.1145/2790060.2790063 | en_US |
dc.identifier.isbn | 978-1-4503-3707-6 | en_US |
dc.identifier.pages | 85-94 | en_US |
dc.identifier.uri | https://doi.org/10.1145/2790060.2790063 | en_US |
dc.publisher | ACM Siggraph | en_US |
dc.subject | Image processing | en_US |
dc.subject | IIR filter | en_US |
dc.subject | GPU computation | en_US |
dc.subject | parallelism | en_US |
dc.subject | memory locality | en_US |
dc.subject | compiler | en_US |
dc.subject | domain | en_US |
dc.subject | specific language | en_US |
dc.subject | high performance | en_US |
dc.title | Compiling High Performance Recursive Filters | en_US |