High-Performance Graphics 2013

Permanent URI for this collection

https://diglib7.eg.org/handle/10.2312/14924

Preface and Table of Contents

[meta data] [files: ]

Advanced Rasterization

PixelPie: Maximal Poisson-disk Sampling with Rasterization

[meta data] [files: ]

Ip, Cheuk Yiu

;

Yalc, M. Adil

;

Luebke, David

;

Varshney, Amitabh

Advanced Rasterization

Theory and Analysis of Higher-Order Motion Blur Rasterization

[meta data] [files: ]

Gribel, Carl Johan

;

Munkberg, Jacob

;

Hasselgren, Jon

;

Akenine-Möller, Tomas

Shadows

Screen-Space Far-Field Ambient Obscurance

[meta data] [files: ]

Timonen, Ville

Advanced Rasterization

Out-of-Core Construction of Sparse Voxel Octrees

[meta data] [files: ]

Baert, Jeroen

;

Lagae, Ares

;

Dutre´, Philip

Fast Interactive Systems

Real-time Local Displacement using Dynamic GPU Memory Management

[meta data] [files: ]

Schäfer, Henry

;

Keinert, Benjamin

;

Stamminger, Marc

Shadows

Imperfect Voxelized Shadow Volumes

[meta data] [files: ]

Wyman, Chris

;

Dai, Zeng

Fast Interactive Systems

Lazy Incremental Computation for Efficient Scene Graph Rendering

[meta data] [files: ]

Wörister, Michael

;

Steinlechner, Harald

;

Maierhofer, Stefan

;

Tobler, Robert F.

Fast Interactive Systems

Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling

[meta data] [files: ]

Loop, Charles

;

Zhang, Cha

;

Zhang, Zhengyou

Building Acceleration Structures for Ray Tracing

Efficient BVH Construction via Approximate Agglomerative Clustering

[meta data] [files: ]

Gu, Yan

;

He, Yong

;

Fatahalian, Kayvon

;

Blelloch, Guy

Ray Tracing Hardware and Techniques

An Energy and Bandwidth Efficient Ray Tracing Architecture

[meta data] [files: ]

Kopta, Daniel

;

Shkurko, Konstantin

;

Spjut, Josef

;

Brunvand, Erik

;

Davis, Al

Building Acceleration Structures for Ray Tracing

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

[meta data] [files: ]

Karras, Tero

;

Aila, Timo

Ray Tracing Hardware and Techniques

SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing

[meta data] [files: ]

Lee, Won-Jong

;

Shin, Youngsam

;

Lee, Jaedon

;

Kim, Jin-Woo

;

Nah, Jae-Ho

;

Jung, Seokyoon

;

Lee, Shihwa

;

Park, Hyun-Sang

;

Han, Tack-Don

Ray Tracing Hardware and Techniques

Efficient Divide-And-Conquer Ray Tracing using Ray Sampling

[meta data] [files: ]

Nabata, Kosuke

;

Iwasaki, Kei

;

Dobashi, Yoshinori

;

Nishita, Tomoyuki

Building Acceleration Structures for Ray Tracing

On Quality Metrics of Bounding Volume Hierarchies

[meta data] [files: ]

Aila, Timo

;

Karras, Tero

;

Laine, Samuli

Ray Tracing Hardware and Techniques

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

[meta data] [files: ]

Laine, Samuli

;

Karras, Tero

;

Aila, Timo

BibTeX (High-Performance Graphics 2013)

@inproceedings{10.2312:EGGH/HPG13/001-frontmatter,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Preface and Table of Contents}},

author = {
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.2312/EGGH/HPG13/001-frontmatter}

}

@inproceedings{10.1145:2492045.2492047,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{PixelPie: Maximal Poisson-disk Sampling with Rasterization}},

author = {Ip, Cheuk Yiu and 
Yalc, M. Adil and 
Luebke, David and 
Varshney, Amitabh
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492047}

}

@inproceedings{10.1145:2492045.2492046,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Theory and Analysis of Higher-Order Motion Blur Rasterization}},

author = {Gribel, Carl Johan and 
Munkberg, Jacob and 
Hasselgren, Jon and 
Akenine-Möller, Tomas
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492046}

}

@inproceedings{10.1145:2492045.2492049,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Screen-Space Far-Field Ambient Obscurance}},

author = {Timonen, Ville
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492049}

}

@inproceedings{10.1145:2492045.2492048,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Out-of-Core Construction of Sparse Voxel Octrees}},

author = {Baert, Jeroen and 
Lagae, Ares and 
Dutre´, Philip
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492048}

}

@inproceedings{10.1145:2492045.2492052,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Real-time Local Displacement using Dynamic GPU Memory Management}},

author = {Schäfer, Henry and 
Keinert, Benjamin and 
Stamminger, Marc
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492052}

}

@inproceedings{10.1145:2492045.2492050,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Imperfect Voxelized Shadow Volumes}},

author = {Wyman, Chris and 
Dai, Zeng
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492050}

}

@inproceedings{10.1145:2492045.2492051,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Lazy Incremental Computation for Efficient Scene Graph Rendering}},

author = {Wörister, Michael and 
Steinlechner, Harald and 
Maierhofer, Stefan and 
Tobler, Robert F.
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492051}

}

@inproceedings{10.1145:2492045.2492053,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling}},

author = {Loop, Charles and 
Zhang, Cha and 
Zhang, Zhengyou
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492053}

}

@inproceedings{10.1145:2492045.2492054,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Efficient BVH Construction via Approximate Agglomerative Clustering}},

author = {Gu, Yan and 
He, Yong and 
Fatahalian, Kayvon and 
Blelloch, Guy
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492054}

}

@inproceedings{10.1145:2492045.2492058,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{An Energy and Bandwidth Efficient Ray Tracing Architecture}},

author = {Kopta, Daniel and 
Shkurko, Konstantin and 
Spjut, Josef and 
Brunvand, Erik and 
Davis, Al
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492058}

}

@inproceedings{10.1145:2492045.2492055,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Fast Parallel Construction of High-Quality Bounding Volume Hierarchies}},

author = {Karras, Tero and 
Aila, Timo
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492055}

}

@inproceedings{10.1145:2492045.2492057,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing}},

author = {Lee, Won-Jong and 
Shin, Youngsam and 
Lee, Jaedon and 
Kim, Jin-Woo and 
Nah, Jae-Ho and 
Jung, Seokyoon and 
Lee, Shihwa and 
Park, Hyun-Sang and 
Han, Tack-Don
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492057}

}

@inproceedings{10.1145:2492045.2492059,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Efficient Divide-And-Conquer Ray Tracing using Ray Sampling}},

author = {Nabata, Kosuke and 
Iwasaki, Kei and 
Dobashi, Yoshinori and 
Nishita, Tomoyuki
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492059}

}

@inproceedings{10.1145:2492045.2492056,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{On Quality Metrics of Bounding Volume Hierarchies}},

author = {Aila, Timo and 
Karras, Tero and 
Laine, Samuli
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492056}

}

@inproceedings{10.1145:2492045.2492060,

booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},

editor = {Kayvon Fatahalian and Christian Theobalt
},
title = {{Megakernels Considered Harmful: Wavefront Path Tracing on GPUs}},

author = {Laine, Samuli and 
Karras, Tero and 
Aila, Timo
},
year = {2013},

publisher = {ACM},

ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},

DOI = {10.1145/2492045.2492060}

}

Browse

Now showing 1 - 16 of 16

Preface and Table of Contents
(ACM, 2013) Kayvon Fatahalian and Christian Theobalt
PixelPie: Maximal Poisson-disk Sampling with Rasterization
(ACM, 2013) Ip, Cheuk Yiu; Yalc, M. Adil; Luebke, David; Varshney, Amitabh; Kayvon Fatahalian and Christian Theobalt
We present PixelPie, a highly parallel geometric formulation of the Poisson-disk sampling problem on the graphics pipeline. Traditionally, generating a distribution by throwing darts and removing conflicts has been viewed as an inherently sequential process. In this paper, we present an efficient Poisson-disk sampling algorithm that uses rasterization in a highly parallel manner. Our technique is an iterative two step process. The first step of each iteration involves rasterization of random darts at varying depths. The second step involves culling conflicted darts. Successive iterations identify and fill in the empty regions to obtain maximal distributions. Our approach maps well to the parallel and optimized graphics functions on the GPU and can be easily extended to perform importance sampling. Our implementation can generate Poisson-disk samples at the rate of nearly 7 million samples per second on a GeForce GTX 580 and is significantly faster than the state-of-the-art maximal Poisson-disk sampling techniques.
Theory and Analysis of Higher-Order Motion Blur Rasterization
(ACM, 2013) Gribel, Carl Johan; Munkberg, Jacob; Hasselgren, Jon; Akenine-Möller, Tomas; Kayvon Fatahalian and Christian Theobalt
A common assumption in motion blur rendering is that the triangle vertices move in straight lines. In this paper, we focus on scenarios where this assumption is no longer valid, such as motion due to fast rotation and other non-linear characteristics. To that end, we present a higher-order representation of vertex motion based on B´ezier curves, which allows for more complex motion paths, and we derive the necessary mathematics for these. In addition, we extend previous work to handle higher-order motion by developing a new tile vs. triangle overlap test. We find that our tile-based rasterizer outperforms all other methods in terms of sample test efficiency, and that our generalization of an interval-based rasterizer is often fastest in terms of wall clock rendering time. In addition, we use our tile test to improve rasterization performance by up to a factor 5 for semi-analytical motion blur rendering
Screen-Space Far-Field Ambient Obscurance
(ACM, 2013) Timonen, Ville; Kayvon Fatahalian and Christian Theobalt
Ambient obscurance (AO) is an effective approximation of global illumination, and its screen-space (SSAO) versions that operate on depth buffers only are widely used in real-time applications. We present an SSAO method that allows the obscurance effect to be determined from the entire depth buffer for each pixel. Our contribution is two-fold: Firstly, we build an obscurance estimator that accurately converges to ray traced reference results on the same screenspace geometry. Secondly, we generate an intermediate representation of the depth field which, when sampled, gives local peaks of the geometry from the point of view of the receiver. Only a small number of such samples are required to capture AO effects without undersampling artefacts that plague previous methods. Our method is unaffected by the radius of the AO effect or by the complexity of the falloff function and produces results within a few percent of a ray traced screen-space reference at constant real-time frame rates.
Out-of-Core Construction of Sparse Voxel Octrees
(ACM, 2013) Baert, Jeroen; Lagae, Ares; Dutre´, Philip; Kayvon Fatahalian and Christian Theobalt
Voxel-based rendering has recently received significant attention due to its potential in the context of efficiently rendering massively large and highly detailed scenes. Unfortunately, few or no scenes are available in the form of sparse voxel octrees. In this paper, we present an out-of-core algorithm for constructing a sparse voxel octree from a triangle mesh. Our algorithm allows the input triangle mesh, the output sparse voxel octree, and, most importantly, the intermediate high-resolution 3D voxel grid, to be larger than available memory. We demonstrate that our out-of-core algorithm can construct sparse voxel octrees from triangle meshes using only a fraction of the memory required by an in-core algorithm in roughly the same time, and that our out-of-core algorithm can also handle extremely large triangle meshes.
Real-time Local Displacement using Dynamic GPU Memory Management
(ACM, 2013) Schäfer, Henry; Keinert, Benjamin; Stamminger, Marc; Kayvon Fatahalian and Christian Theobalt
We propose a novel method for local displacement events in large scenes, such as scratches, footsteps, or sculpting operations. Deformations are stored as displacements for vertices generated by hardware tessellation. Adaptive mesh refinement, application of the displacement and all involved memory management happen completely on the GPU. We show various extensions to our approach, such as on-the-fly normal computation and multi-resolution editing. In typical game scenes we perform local deformations at arbitrary positions in far less than one millisecond. This makes the method particularly suited for games and interactive sculpting applications.
Imperfect Voxelized Shadow Volumes
(ACM, 2013) Wyman, Chris; Dai, Zeng; Kayvon Fatahalian and Christian Theobalt
Voxelized shadow volumes [Wyman 2011] provide a discretized view-dependent representation of shadow volumes, but are limited to point or directional lights. We extend them to allow dynamic volumetric visibility from area light sources using imperfect shadow volumes. We show a coarser visibility sampling suffices for area lights. Combining this coarser resolution with a parallel shadow volume construction enables interactive rendering of dynamic volumetric shadows from area lights in homogeneous single-scattering media, at under 4x the cost of hard volumetric shadows.
Lazy Incremental Computation for Efficient Scene Graph Rendering
(ACM, 2013) Wörister, Michael; Steinlechner, Harald; Maierhofer, Stefan; Tobler, Robert F.; Kayvon Fatahalian and Christian Theobalt
In order to provide a highly performant rendering system while maintaining a scene graph structure with a high level of abstraction, we introduce improved rendering caches, that can be updated incrementally without any scene graph traversal. The basis of this novel system is the use of a dependency graph, that can be synthesized from the scene graph and links all sources of changes to the affected parts of rendering caches. By using and extending concepts from w incremental computation we minimize the computational overhead for performing the necessary updates due to changes in any inputs. This makes it possible to provide a high-level semantic scene graph, while retaining the opportunity to apply a number of known optimizations to the rendering caches even for dynamic scenes. Our evaluation shows that the resulting rendering system is highly competitive and provides good rendering performance for scenes ranging from completely static geometry all the way to completely dynamic geometry.
Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling
(ACM, 2013) Loop, Charles; Zhang, Cha; Zhang, Zhengyou; Kayvon Fatahalian and Christian Theobalt
We present a system for real-time, high-resolution, sparse voxelization of an image-based surface model. Our approach consists of a coarse-to-fine voxel representation and a collection of parallel processing steps. Voxels are stored as a list of unsigned integer triples. An oracle kernel decides, for each voxel in parallel, whether to keep or cull its voxel from the list based on an image consistency criterion of its projection across cameras. After a prefix sum scan, kept voxels are subdivided and the process repeats until projected voxels are pixel size. These voxels are drawn to a render target and shaded as a weighted combination of their projections into a set of calibrated RGB images. We apply this technique to the problem of smooth visual hull reconstruction of human subjects based on a set of live image streams. We demonstrate that human upper body shapes can be reconstructed to giga voxel resolution at greater than 30 fps on modern graphics hardware.
Efficient BVH Construction via Approximate Agglomerative Clustering
(ACM, 2013) Gu, Yan; He, Yong; Fatahalian, Kayvon; Blelloch, Guy; Kayvon Fatahalian and Christian Theobalt
We introduce Approximate Agglomerative Clustering (AAC), an efficient, easily parallelizable algorithm for generating high-quality bounding volume hierarchies using agglomerative clustering. The main idea of AAC is to compute an approximation to the true greedy agglomerative clustering solution by restricting the set of candidates inspected when identifying neighboring geometry in the scene. The result is a simple algorithm that often produces higher quality hierarchies (in terms of subsequent ray tracing cost) than a full sweep SAH build yet executes in less time than the widely used top-down, approximate SAH build algorithm based on binning.
An Energy and Bandwidth Efficient Ray Tracing Architecture
(ACM, 2013) Kopta, Daniel; Shkurko, Konstantin; Spjut, Josef; Brunvand, Erik; Davis, Al; Kayvon Fatahalian and Christian Theobalt
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases the L1 hit rate and reduces off-chip memory accesses substantially. Second, we employ reconfigurable specialpurpose pipelines than are constructed dynamically under program control. These pipelines use shared execution units (XUs) that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm, such as acceleration structure traversal and triangle intersection. This reduces the overhead incurred by memory and register accesses. These two synergistic features yield a ray tracing architecture that significantly reduces both power consumption and off-chip memory traffic when compared to a more traditional cache only approach.
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies
(ACM, 2013) Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian Theobalt
We propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M triangles/ sec on NVIDIA GTX Titan. We also propose an improved approach for parallel splitting of triangles prior to tree construction. Averaged over 20 test scenes, the resulting trees offer over 90% of the ray tracing performance of the best offline construction method (SBVH), while previous fast GPU algorithms offer only about 50%. Compared to state-of-the-art, our method offers a significant improvement in the majority of practical workloads that need to construct the BVH for each frame. On the average, it gives the best overall performance when tracing between 7 million and 60 billion rays per frame. This covers most interactive applications, product and architectural design, and even movie rendering.
SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing
(ACM, 2013) Lee, Won-Jong; Shin, Youngsam; Lee, Jaedon; Kim, Jin-Woo; Nah, Jae-Ho; Jung, Seokyoon; Lee, Shihwa; Park, Hyun-Sang; Han, Tack-Don; Kayvon Fatahalian and Christian Theobalt
Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.
Efficient Divide-And-Conquer Ray Tracing using Ray Sampling
(ACM, 2013) Nabata, Kosuke; Iwasaki, Kei; Dobashi, Yoshinori; Nishita, Tomoyuki; Kayvon Fatahalian and Christian Theobalt
Divide-and-conquer ray tracing (DACRT) methods solve intersection problems between large numbers of rays and primitives by recursively subdividing the problem size until it can be easily solved. Previous DACRT methods subdivide the intersection problem based on the distribution of primitives only, and do not exploit the distribution of rays, which results in a decrease of the rendering performance especially for high resolution images with antialiasing. We propose an efficient DACRT method that exploits the distribution of rays by sampling the rays to construct an acceleration data structure. To accelerate ray traversals, we have derived a new cost metric which is used to avoid inefficient subdivision of the intersection problem where the number of rays is not sufficiently reduced. Our method accelerates the tracing of many types of rays (primary rays, less coherent secondary rays, random rays for path tracing) by a factor of up to 2 using ray sampling.
On Quality Metrics of Bounding Volume Hierarchies
(ACM, 2013) Aila, Timo; Karras, Tero; Laine, Samuli; Kayvon Fatahalian and Christian Theobalt
The surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs
(ACM, 2013) Laine, Samuli; Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian Theobalt
When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for realworld use cases where multiple complex materials are present in the scene.

BibTeX (High-Performance Graphics 2013)

Browse

Recent Submissions

Results Per Page

Sort Options