High-Performance Graphics 2013

Permanent URI for this collection


Preface and Table of Contents

Advanced Rasterization

PixelPie: Maximal Poisson-disk Sampling with Rasterization

Ip, Cheuk Yiu
Yalc, M. Adil
Luebke, David
Varshney, Amitabh
Advanced Rasterization

Theory and Analysis of Higher-Order Motion Blur Rasterization

Gribel, Carl Johan
Munkberg, Jacob
Hasselgren, Jon
Akenine-Möller, Tomas
Shadows

Screen-Space Far-Field Ambient Obscurance

Timonen, Ville
Advanced Rasterization

Out-of-Core Construction of Sparse Voxel Octrees

Baert, Jeroen
Lagae, Ares
Dutre´, Philip
Fast Interactive Systems

Real-time Local Displacement using Dynamic GPU Memory Management

Schäfer, Henry
Keinert, Benjamin
Stamminger, Marc
Shadows

Imperfect Voxelized Shadow Volumes

Wyman, Chris
Dai, Zeng
Fast Interactive Systems

Lazy Incremental Computation for Efficient Scene Graph Rendering

Wörister, Michael
Steinlechner, Harald
Maierhofer, Stefan
Tobler, Robert F.
Fast Interactive Systems

Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling

Loop, Charles
Zhang, Cha
Zhang, Zhengyou
Building Acceleration Structures for Ray Tracing

Efficient BVH Construction via Approximate Agglomerative Clustering

Gu, Yan
He, Yong
Fatahalian, Kayvon
Blelloch, Guy
Ray Tracing Hardware and Techniques

An Energy and Bandwidth Efficient Ray Tracing Architecture

Kopta, Daniel
Shkurko, Konstantin
Spjut, Josef
Brunvand, Erik
Davis, Al
Building Acceleration Structures for Ray Tracing

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Karras, Tero
Aila, Timo
Ray Tracing Hardware and Techniques

SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing

Lee, Won-Jong
Shin, Youngsam
Lee, Jaedon
Kim, Jin-Woo
Nah, Jae-Ho
Jung, Seokyoon
Lee, Shihwa
Park, Hyun-Sang
Han, Tack-Don
Ray Tracing Hardware and Techniques

Efficient Divide-And-Conquer Ray Tracing using Ray Sampling

Nabata, Kosuke
Iwasaki, Kei
Dobashi, Yoshinori
Nishita, Tomoyuki
Building Acceleration Structures for Ray Tracing

On Quality Metrics of Bounding Volume Hierarchies

Aila, Timo
Karras, Tero
Laine, Samuli
Ray Tracing Hardware and Techniques

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

Laine, Samuli
Karras, Tero
Aila, Timo


BibTeX (High-Performance Graphics 2013)
@inproceedings{
10.2312:EGGH/HPG13/001-frontmatter,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Preface and Table of Contents}},
author = { year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.2312/EGGH/HPG13/001-frontmatter}
}
@inproceedings{
10.1145:2492045.2492047,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
PixelPie: Maximal Poisson-disk Sampling with Rasterization}},
author = {
Ip, Cheuk Yiu
 and
Yalc, M. Adil
 and
Luebke, David
 and
Varshney, Amitabh
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492047}
}
@inproceedings{
10.1145:2492045.2492046,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Theory and Analysis of Higher-Order Motion Blur Rasterization}},
author = {
Gribel, Carl Johan
 and
Munkberg, Jacob
 and
Hasselgren, Jon
 and
Akenine-Möller, Tomas
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492046}
}
@inproceedings{
10.1145:2492045.2492049,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Screen-Space Far-Field Ambient Obscurance}},
author = {
Timonen, Ville
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492049}
}
@inproceedings{
10.1145:2492045.2492048,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Out-of-Core Construction of Sparse Voxel Octrees}},
author = {
Baert, Jeroen
 and
Lagae, Ares
 and
Dutre´, Philip
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492048}
}
@inproceedings{
10.1145:2492045.2492052,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Real-time Local Displacement using Dynamic GPU Memory Management}},
author = {
Schäfer, Henry
 and
Keinert, Benjamin
 and
Stamminger, Marc
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492052}
}
@inproceedings{
10.1145:2492045.2492050,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Imperfect Voxelized Shadow Volumes}},
author = {
Wyman, Chris
 and
Dai, Zeng
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492050}
}
@inproceedings{
10.1145:2492045.2492051,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Lazy Incremental Computation for Efficient Scene Graph Rendering}},
author = {
Wörister, Michael
 and
Steinlechner, Harald
 and
Maierhofer, Stefan
 and
Tobler, Robert F.
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492051}
}
@inproceedings{
10.1145:2492045.2492053,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling}},
author = {
Loop, Charles
 and
Zhang, Cha
 and
Zhang, Zhengyou
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492053}
}
@inproceedings{
10.1145:2492045.2492054,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Efficient BVH Construction via Approximate Agglomerative Clustering}},
author = {
Gu, Yan
 and
He, Yong
 and
Fatahalian, Kayvon
 and
Blelloch, Guy
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492054}
}
@inproceedings{
10.1145:2492045.2492058,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
An Energy and Bandwidth Efficient Ray Tracing Architecture}},
author = {
Kopta, Daniel
 and
Shkurko, Konstantin
 and
Spjut, Josef
 and
Brunvand, Erik
 and
Davis, Al
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492058}
}
@inproceedings{
10.1145:2492045.2492055,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies}},
author = {
Karras, Tero
 and
Aila, Timo
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492055}
}
@inproceedings{
10.1145:2492045.2492057,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing}},
author = {
Lee, Won-Jong
 and
Shin, Youngsam
 and
Lee, Jaedon
 and
Kim, Jin-Woo
 and
Nah, Jae-Ho
 and
Jung, Seokyoon
 and
Lee, Shihwa
 and
Park, Hyun-Sang
 and
Han, Tack-Don
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492057}
}
@inproceedings{
10.1145:2492045.2492059,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Efficient Divide-And-Conquer Ray Tracing using Ray Sampling}},
author = {
Nabata, Kosuke
 and
Iwasaki, Kei
 and
Dobashi, Yoshinori
 and
Nishita, Tomoyuki
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492059}
}
@inproceedings{
10.1145:2492045.2492056,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
On Quality Metrics of Bounding Volume Hierarchies}},
author = {
Aila, Timo
 and
Karras, Tero
 and
Laine, Samuli
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492056}
}
@inproceedings{
10.1145:2492045.2492060,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Kayvon Fatahalian and Christian Theobalt
}, title = {{
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs}},
author = {
Laine, Samuli
 and
Karras, Tero
 and
Aila, Timo
}, year = {
2013},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-2135-8},
DOI = {
10.1145/2492045.2492060}
}

Browse

Recent Submissions

Now showing 1 - 16 of 16
  • Item
    Preface and Table of Contents
    (ACM, 2013) Kayvon Fatahalian and Christian Theobalt
  • Item
    PixelPie: Maximal Poisson-disk Sampling with Rasterization
    (ACM, 2013) Ip, Cheuk Yiu; Yalc, M. Adil; Luebke, David; Varshney, Amitabh; Kayvon Fatahalian and Christian Theobalt
    We present PixelPie, a highly parallel geometric formulation of the Poisson-disk sampling problem on the graphics pipeline. Traditionally, generating a distribution by throwing darts and removing conflicts has been viewed as an inherently sequential process. In this paper, we present an efficient Poisson-disk sampling algorithm that uses rasterization in a highly parallel manner. Our technique is an iterative two step process. The first step of each iteration involves rasterization of random darts at varying depths. The second step involves culling conflicted darts. Successive iterations identify and fill in the empty regions to obtain maximal distributions. Our approach maps well to the parallel and optimized graphics functions on the GPU and can be easily extended to perform importance sampling. Our implementation can generate Poisson-disk samples at the rate of nearly 7 million samples per second on a GeForce GTX 580 and is significantly faster than the state-of-the-art maximal Poisson-disk sampling techniques.
  • Item
    Theory and Analysis of Higher-Order Motion Blur Rasterization
    (ACM, 2013) Gribel, Carl Johan; Munkberg, Jacob; Hasselgren, Jon; Akenine-Möller, Tomas; Kayvon Fatahalian and Christian Theobalt
    A common assumption in motion blur rendering is that the triangle vertices move in straight lines. In this paper, we focus on scenarios where this assumption is no longer valid, such as motion due to fast rotation and other non-linear characteristics. To that end, we present a higher-order representation of vertex motion based on B´ezier curves, which allows for more complex motion paths, and we derive the necessary mathematics for these. In addition, we extend previous work to handle higher-order motion by developing a new tile vs. triangle overlap test. We find that our tile-based rasterizer outperforms all other methods in terms of sample test efficiency, and that our generalization of an interval-based rasterizer is often fastest in terms of wall clock rendering time. In addition, we use our tile test to improve rasterization performance by up to a factor 5 for semi-analytical motion blur rendering
  • Item
    Screen-Space Far-Field Ambient Obscurance
    (ACM, 2013) Timonen, Ville; Kayvon Fatahalian and Christian Theobalt
    Ambient obscurance (AO) is an effective approximation of global illumination, and its screen-space (SSAO) versions that operate on depth buffers only are widely used in real-time applications. We present an SSAO method that allows the obscurance effect to be determined from the entire depth buffer for each pixel. Our contribution is two-fold: Firstly, we build an obscurance estimator that accurately converges to ray traced reference results on the same screenspace geometry. Secondly, we generate an intermediate representation of the depth field which, when sampled, gives local peaks of the geometry from the point of view of the receiver. Only a small number of such samples are required to capture AO effects without undersampling artefacts that plague previous methods. Our method is unaffected by the radius of the AO effect or by the complexity of the falloff function and produces results within a few percent of a ray traced screen-space reference at constant real-time frame rates.
  • Item
    Out-of-Core Construction of Sparse Voxel Octrees
    (ACM, 2013) Baert, Jeroen; Lagae, Ares; Dutre´, Philip; Kayvon Fatahalian and Christian Theobalt
    Voxel-based rendering has recently received significant attention due to its potential in the context of efficiently rendering massively large and highly detailed scenes. Unfortunately, few or no scenes are available in the form of sparse voxel octrees. In this paper, we present an out-of-core algorithm for constructing a sparse voxel octree from a triangle mesh. Our algorithm allows the input triangle mesh, the output sparse voxel octree, and, most importantly, the intermediate high-resolution 3D voxel grid, to be larger than available memory. We demonstrate that our out-of-core algorithm can construct sparse voxel octrees from triangle meshes using only a fraction of the memory required by an in-core algorithm in roughly the same time, and that our out-of-core algorithm can also handle extremely large triangle meshes.
  • Item
    Real-time Local Displacement using Dynamic GPU Memory Management
    (ACM, 2013) Schäfer, Henry; Keinert, Benjamin; Stamminger, Marc; Kayvon Fatahalian and Christian Theobalt
    We propose a novel method for local displacement events in large scenes, such as scratches, footsteps, or sculpting operations. Deformations are stored as displacements for vertices generated by hardware tessellation. Adaptive mesh refinement, application of the displacement and all involved memory management happen completely on the GPU. We show various extensions to our approach, such as on-the-fly normal computation and multi-resolution editing. In typical game scenes we perform local deformations at arbitrary positions in far less than one millisecond. This makes the method particularly suited for games and interactive sculpting applications.
  • Item
    Imperfect Voxelized Shadow Volumes
    (ACM, 2013) Wyman, Chris; Dai, Zeng; Kayvon Fatahalian and Christian Theobalt
    Voxelized shadow volumes [Wyman 2011] provide a discretized view-dependent representation of shadow volumes, but are limited to point or directional lights. We extend them to allow dynamic volumetric visibility from area light sources using imperfect shadow volumes. We show a coarser visibility sampling suffices for area lights. Combining this coarser resolution with a parallel shadow volume construction enables interactive rendering of dynamic volumetric shadows from area lights in homogeneous single-scattering media, at under 4x the cost of hard volumetric shadows.
  • Item
    Lazy Incremental Computation for Efficient Scene Graph Rendering
    (ACM, 2013) Wörister, Michael; Steinlechner, Harald; Maierhofer, Stefan; Tobler, Robert F.; Kayvon Fatahalian and Christian Theobalt
    In order to provide a highly performant rendering system while maintaining a scene graph structure with a high level of abstraction, we introduce improved rendering caches, that can be updated incrementally without any scene graph traversal. The basis of this novel system is the use of a dependency graph, that can be synthesized from the scene graph and links all sources of changes to the affected parts of rendering caches. By using and extending concepts from w incremental computation we minimize the computational overhead for performing the necessary updates due to changes in any inputs. This makes it possible to provide a high-level semantic scene graph, while retaining the opportunity to apply a number of known optimizations to the rendering caches even for dynamic scenes. Our evaluation shows that the resulting rendering system is highly competitive and provides good rendering performance for scenes ranging from completely static geometry all the way to completely dynamic geometry.
  • Item
    Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling
    (ACM, 2013) Loop, Charles; Zhang, Cha; Zhang, Zhengyou; Kayvon Fatahalian and Christian Theobalt
    We present a system for real-time, high-resolution, sparse voxelization of an image-based surface model. Our approach consists of a coarse-to-fine voxel representation and a collection of parallel processing steps. Voxels are stored as a list of unsigned integer triples. An oracle kernel decides, for each voxel in parallel, whether to keep or cull its voxel from the list based on an image consistency criterion of its projection across cameras. After a prefix sum scan, kept voxels are subdivided and the process repeats until projected voxels are pixel size. These voxels are drawn to a render target and shaded as a weighted combination of their projections into a set of calibrated RGB images. We apply this technique to the problem of smooth visual hull reconstruction of human subjects based on a set of live image streams. We demonstrate that human upper body shapes can be reconstructed to giga voxel resolution at greater than 30 fps on modern graphics hardware.
  • Item
    Efficient BVH Construction via Approximate Agglomerative Clustering
    (ACM, 2013) Gu, Yan; He, Yong; Fatahalian, Kayvon; Blelloch, Guy; Kayvon Fatahalian and Christian Theobalt
    We introduce Approximate Agglomerative Clustering (AAC), an efficient, easily parallelizable algorithm for generating high-quality bounding volume hierarchies using agglomerative clustering. The main idea of AAC is to compute an approximation to the true greedy agglomerative clustering solution by restricting the set of candidates inspected when identifying neighboring geometry in the scene. The result is a simple algorithm that often produces higher quality hierarchies (in terms of subsequent ray tracing cost) than a full sweep SAH build yet executes in less time than the widely used top-down, approximate SAH build algorithm based on binning.
  • Item
    An Energy and Bandwidth Efficient Ray Tracing Architecture
    (ACM, 2013) Kopta, Daniel; Shkurko, Konstantin; Spjut, Josef; Brunvand, Erik; Davis, Al; Kayvon Fatahalian and Christian Theobalt
    We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases the L1 hit rate and reduces off-chip memory accesses substantially. Second, we employ reconfigurable specialpurpose pipelines than are constructed dynamically under program control. These pipelines use shared execution units (XUs) that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm, such as acceleration structure traversal and triangle intersection. This reduces the overhead incurred by memory and register accesses. These two synergistic features yield a ray tracing architecture that significantly reduces both power consumption and off-chip memory traffic when compared to a more traditional cache only approach.
  • Item
    Fast Parallel Construction of High-Quality Bounding Volume Hierarchies
    (ACM, 2013) Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian Theobalt
    We propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M triangles/ sec on NVIDIA GTX Titan. We also propose an improved approach for parallel splitting of triangles prior to tree construction. Averaged over 20 test scenes, the resulting trees offer over 90% of the ray tracing performance of the best offline construction method (SBVH), while previous fast GPU algorithms offer only about 50%. Compared to state-of-the-art, our method offers a significant improvement in the majority of practical workloads that need to construct the BVH for each frame. On the average, it gives the best overall performance when tracing between 7 million and 60 billion rays per frame. This covers most interactive applications, product and architectural design, and even movie rendering.
  • Item
    SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing
    (ACM, 2013) Lee, Won-Jong; Shin, Youngsam; Lee, Jaedon; Kim, Jin-Woo; Nah, Jae-Ho; Jung, Seokyoon; Lee, Shihwa; Park, Hyun-Sang; Han, Tack-Don; Kayvon Fatahalian and Christian Theobalt
    Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.
  • Item
    Efficient Divide-And-Conquer Ray Tracing using Ray Sampling
    (ACM, 2013) Nabata, Kosuke; Iwasaki, Kei; Dobashi, Yoshinori; Nishita, Tomoyuki; Kayvon Fatahalian and Christian Theobalt
    Divide-and-conquer ray tracing (DACRT) methods solve intersection problems between large numbers of rays and primitives by recursively subdividing the problem size until it can be easily solved. Previous DACRT methods subdivide the intersection problem based on the distribution of primitives only, and do not exploit the distribution of rays, which results in a decrease of the rendering performance especially for high resolution images with antialiasing. We propose an efficient DACRT method that exploits the distribution of rays by sampling the rays to construct an acceleration data structure. To accelerate ray traversals, we have derived a new cost metric which is used to avoid inefficient subdivision of the intersection problem where the number of rays is not sufficiently reduced. Our method accelerates the tracing of many types of rays (primary rays, less coherent secondary rays, random rays for path tracing) by a factor of up to 2 using ray sampling.
  • Item
    On Quality Metrics of Bounding Volume Hierarchies
    (ACM, 2013) Aila, Timo; Karras, Tero; Laine, Samuli; Kayvon Fatahalian and Christian Theobalt
    The surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.
  • Item
    Megakernels Considered Harmful: Wavefront Path Tracing on GPUs
    (ACM, 2013) Laine, Samuli; Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian Theobalt
    When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for realworld use cases where multiple complex materials are present in the scene.