High-Performance Graphics 2011

Permanent URI for this collection

Approximate Global Illumination

SSLPV: Subsurface Light Propagation Volumes

Børlum, Jesper
Christensen, Brian Bunch
Kjeldsen, Thomas Kim
Mikkelsen, Peter Trier
Noe, Karsten Østergaard
Rimestad, Jens
Mosegaard, Jesper
Approximate Global Illumination

The Alchemy Screen-Space Ambient Obscurance Algorithm

McGuire, Morgan
Osman, Brian
Bukowski, Michael
Hennessy, Padraic

Preface and Table of Contents

Approximate Global Illumination

Real-Time Diffuse Global Illumination Using Radiance Hints

Papaioannou, Georgios
Parallel Ray Tracing

Active Thread Compaction for GPU Path Tracing

Wald, Ingo
Parallel Ray Tracing

Improving SIMD Efficiency for Parallel Monte Carlo Light Transport on the GPU

Antwerpen, Dietger van
GPU Computing & Computational Graphics

Randomized Selection on the GPU

Monroe, Laura
Wendelberger, Joanne
Michalak, Sarah
Acceleration Structures

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur

Gruenschloß, Leonhard
Stich, Martin
Nawaz, Sehera
Keller, Alexander
Parallel Ray Tracing

Voxelized Shadow Volumes

Wyman, Chris
GPU Computing & Computational Graphics

High-Performance Software Rasterization on GPUs

Laine, Samuli
Karras, Tero
Acceleration Structures

SAH KD-Tree Construction on GPU

Wu, Zhefeng
Zhao, Fukai
Liu, Xinguo
Acceleration Structures

Simpler and Faster HLBVH with Work Queues

Garanzha, Kirill
Pantaleoni, Jacopo
McAllister, David
Geometric Computations

Rapid Simplifi cation of Multi-Attribute Meshes

Willmott, Andrew
Rethinking Rasterization

Depth Buffer Compression for Stochastic Motion Blur Rasterization

Andersson, Magnus
Hasselgren, Jon
Akenine-Moeller, Tomas
Rethinking Rasterization

Adaptive Transparency

Salvi, Marco
Montgomery, Jefferson
Lefohn, Aaron
Geometric Computations

An Inexpensive Bounding Representation for Offsets of Quadratic Curves

Ruf, Erik
Hardware & Textures

Precision Selection for Energy-Effi cient Pixel Shaders

Pool, Jeff
Lastra, Anselmo
Singh, Montek
Geometric Computations

Farthest-Point Optimized Point Sets with Maximized Minimum Distance

Schlömer, Thomas
Heck, Daniel
Deussen, Oliver
Rethinking Rasterization

Hierarchical Stochastic Motion Blur Rasterization

Munkberg, Jacob
Clarberg, Petrik
Hasselgren, Jon
Toth, Robert
Sugihara, Masamichi
Akenine-Moeller, Tomas
GPU Computing & Computational Graphics

VoxelPipe: A Programmable Pipeline for 3D Voxelization

Pantaleoni, Jacopo
Hardware & Textures

Lossless Compression of Already Compressed Textures

Ström, Jacob
Wennersten, Per
Hardware & Textures

Primitive Processing and Advanced Shading Architecture for Embedded Space

Kazakov, Max
Ohbuchi, Eisaku


BibTeX (High-Performance Graphics 2011)
@inproceedings{
10.1145:2018323.2018325,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
SSLPV: Subsurface Light Propagation Volumes}},
author = {
Børlum, Jesper
 and
Christensen, Brian Bunch
 and
Kjeldsen, Thomas Kim
 and
Mikkelsen, Peter Trier
 and
Noe, Karsten Østergaard
 and
Rimestad, Jens
 and
Mosegaard, Jesper
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018325}
}
@inproceedings{
10.1145:2018323.2018327,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
The Alchemy Screen-Space Ambient Obscurance Algorithm}},
author = {
McGuire, Morgan
 and
Osman, Brian
 and
Bukowski, Michael
 and
Hennessy, Padraic
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018327}
}
@inproceedings{
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Preface and Table of Contents}},
author = { year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = { }
@inproceedings{
10.1145:2018323.2018326,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Real-Time Diffuse Global Illumination Using Radiance Hints}},
author = {
Papaioannou, Georgios
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018326}
}
@inproceedings{
10.1145:2018323.2018331,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Active Thread Compaction for GPU Path Tracing}},
author = {
Wald, Ingo
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018331}
}
@inproceedings{
10.1145:2018323.2018330,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Improving SIMD Efficiency for Parallel Monte Carlo Light Transport on the GPU}},
author = {
Antwerpen, Dietger van
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018330}
}
@inproceedings{
10.1145:2018323.2018338,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Randomized Selection on the GPU}},
author = {
Monroe, Laura
 and
Wendelberger, Joanne
 and
Michalak, Sarah
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018338}
}
@inproceedings{
10.1145:2018323.2018334,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur}},
author = {
Gruenschloß, Leonhard
 and
Stich, Martin
 and
Nawaz, Sehera
 and
Keller, Alexander
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018334}
}
@inproceedings{
10.1145:2018323.2018329,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Voxelized Shadow Volumes}},
author = {
Wyman, Chris
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018329}
}
@inproceedings{
10.1145:2018323.2018337,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
High-Performance Software Rasterization on GPUs}},
author = {
Laine, Samuli
 and
Karras, Tero
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018337}
}
@inproceedings{
10.1145:2018323.2018335,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
SAH KD-Tree Construction on GPU}},
author = {
Wu, Zhefeng
 and
Zhao, Fukai
 and
Liu, Xinguo
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018335}
}
@inproceedings{
10.1145:2018323.2018333,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Simpler and Faster HLBVH with Work Queues}},
author = {
Garanzha, Kirill
 and
Pantaleoni, Jacopo
 and
McAllister, David
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018333}
}
@inproceedings{
10.1145:2018323.2018347,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Rapid Simplifi cation of Multi-Attribute Meshes}},
author = {
Willmott, Andrew
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018347}
}
@inproceedings{
10.1145:2018323.2018343,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Depth Buffer Compression for Stochastic Motion Blur Rasterization}},
author = {
Andersson, Magnus
 and
Hasselgren, Jon
 and
Akenine-Moeller, Tomas
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018343}
}
@inproceedings{
10.1145:2018323.2018342,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Adaptive Transparency}},
author = {
Salvi, Marco
 and
Montgomery, Jefferson
 and
Lefohn, Aaron
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018342}
}
@inproceedings{
10.1145:2018323.2018346,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
An Inexpensive Bounding Representation for Offsets of Quadratic Curves}},
author = {
Ruf, Erik
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018346}
}
@inproceedings{
10.1145:2018323.2018349,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Precision Selection for Energy-Effi cient Pixel Shaders}},
author = {
Pool, Jeff
 and
Lastra, Anselmo
 and
Singh, Montek
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018349}
}
@inproceedings{
10.1145:2018323.2018345,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Farthest-Point Optimized Point Sets with Maximized Minimum Distance}},
author = {
Schlömer, Thomas
 and
Heck, Daniel
 and
Deussen, Oliver
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018345}
}
@inproceedings{
10.1145:2018323.2018341,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Hierarchical Stochastic Motion Blur Rasterization}},
author = {
Munkberg, Jacob
 and
Clarberg, Petrik
 and
Hasselgren, Jon
 and
Toth, Robert
 and
Sugihara, Masamichi
 and
Akenine-Moeller, Tomas
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018341}
}
@inproceedings{
10.1145:2018323.2018339,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
VoxelPipe: A Programmable Pipeline for 3D Voxelization}},
author = {
Pantaleoni, Jacopo
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018339}
}
@inproceedings{
10.1145:2018323.2018351,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Lossless Compression of Already Compressed Textures}},
author = {
Ström, Jacob
 and
Wennersten, Per
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018351}
}
@inproceedings{
10.1145:2018323.2018350,
booktitle = {
Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {
Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
}, title = {{
Primitive Processing and Advanced Shading Architecture for Embedded Space}},
author = {
Kazakov, Max
 and
Ohbuchi, Eisaku
}, year = {
2011},
publisher = {
ACM},
ISSN = {2079-8687},
ISBN = {978-1-4503-0896-0},
DOI = {
10.1145/2018323.2018350}
}

Browse

Recent Submissions

Now showing 1 - 22 of 22
  • Item
    SSLPV: Subsurface Light Propagation Volumes
    (ACM, 2011) Børlum, Jesper; Christensen, Brian Bunch; Kjeldsen, Thomas Kim; Mikkelsen, Peter Trier; Noe, Karsten Østergaard; Rimestad, Jens; Mosegaard, Jesper; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    This paper presents the Subsurface Light Propagation Volume (SSLPV) method for real-time approximation of subsurface scattering effects in dynamic scenes with changing mesh opology and lighting. SSLPV extends the Light Propagation Volume (LPV) technique for indirect illumination in video games. We introduce a new consistent method for injecting flux from point light sources into an LPV grid, a new rendering method which consistently convertslight intensity stored in an LPV grid into incident radiance, as well as a model for light scattering and absorption inside heterogeneous materials. Our scheme does not require any precomputation and handles arbitrarily deforming meshes. We show that SSLPV provides visually pleasing results in real-time at the expense of a few milliseconds of added rendering time.
  • Item
    The Alchemy Screen-Space Ambient Obscurance Algorithm
    (ACM, 2011) McGuire, Morgan; Osman, Brian; Bukowski, Michael; Hennessy, Padraic; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Ambient obscurance (AO) produces perceptually important illumination effects such as darkened corners, cracks, and wrinkles; proximity darkening; and contact shadows. We present the AO algorithm from the Alchemy engine used at Vicarious Visions in commercialgames. It is based on a new derivation of screen-space obscurance for robustness, and the insight that a falloff function can cancel terms in a visibility integral to favor efficient operations. Alchemy creates contact shadows that conform to surfaces, capturesobscurance from geometry of varying scale, and provides four intuitive appearance parameters: world-space radius and bias, and aesthetic intensity and contrast. The algorithm estimates obscurance at a pixel from sample points read from depth and normal buffers. It processes dynamic scenes at HD 720p resolution in about 4.5 ms on Xbox 360 and 3 ms onNVIDIA GeForce580.
  • Item
    Preface and Table of Contents
    (ACM, 2011) Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
  • Item
    Real-Time Diffuse Global Illumination Using Radiance Hints
    (ACM, 2011) Papaioannou, Georgios; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    GPU-based interactive global illumination techniques are receiving an increasing interest from both the research and the industrial community as real-time graphics applications strive for vi-sually rich and realistic dynamic three-dimensional environments. This paper presents a fast new diffuse global illumination method that generates a sparse set of low-cost radiance field evaluation points (radiance hints) and computes an arbitrary number of diffuse inter-reflections within a given volume. The proposed approximate technique combines ideas from exiting grid-based radiance caching techniques with reflective shadow maps as well asa stochastic scheme for visibility calculations, in order to achieve high frame rates for multiple light bounces.
  • Item
    Active Thread Compaction for GPU Path Tracing
    (ACM, 2011) Wald, Ingo; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Modern GPUs like NVidia s Fermi internally operate in a SIMD manner by ganging multiple (32) scalar threads together into SIMD warps; if a warp s threads diverge, the warp serially executes both branches, temporarily disabling threads that are not on that path. In this paper, we explore and thoroughly analyze the concept of active thread compaction i.e., the process of taking multiple partially-filled warps and compacting them to fewer but fully utilized warps in the context of a CUDA path tracer. Our results show that this technique can indeed lead to significant improvements in SIMD utilization, and corresponding savings in theamount of work performed; however, they also show that certain inadequacies of today s hardware wipe out most of the achieved gains, leaving bottom-up speed-ups of a mere 12 16%. We believe our analysis of why this is the case will provide insight to otherresearchers experimenting with this technique in different contexts.
  • Item
    Improving SIMD Efficiency for Parallel Monte Carlo Light Transport on the GPU
    (ACM, 2011) Antwerpen, Dietger van; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Monte Carlo Light Transport algorithms such as Path Tracing (PT), Bi-Directional Path Tracing (BDPT) and Metropolis Light Transport (MLT) make use of random walks to sample light transport paths. When parallelizing these algorithms on the GPU the stochastic termination of random walks results in an uneven workload between samples, which reduces SIMD efficiency. In this paper we propose to combine stream compaction and sample regeneration to keep SIMD efficiency high during random walk construction, in spite of stochastic termination. Furthermore, for BDPT and MLT, we propose to evaluate all bidirectional connections of a sample in parallel in order to balance the workload between GPU threads and improve SIMD efficiency during sample evaluation. We present efficient parallel GPU-only implementations for PT, BDPT, and MLT in CUDA.We show that our GPU implementations outperform similarCPU implementations by an order of magnitude.
  • Item
    Randomized Selection on the GPU
    (ACM, 2011) Monroe, Laura; Wendelberger, Joanne; Michalak, Sarah; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    We implement here a fast and memory-sparing probabilistic top k selection algorithm on the GPU. The algorithm proceeds via an iterative probabilistic guess-and-check process on pivots for a three-way partition. When the guess is correct, the problem is reduced to selection on a much smaller set. This probabilistic algorithm always gives a correct result and always terminates. Las Vegas algorithms of this kind are a form of stochastic optimization and can be well suited to more general parallel processors with limited amounts of fast memory.
  • Item
    MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur
    (ACM, 2011) Gruenschloß, Leonhard; Stich, Martin; Nawaz, Sehera; Keller, Alexander; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    When a bounding volume hierarchy is used for accelerating the intersection of rays and scene geometry, one common way to incorporate motion blur is to interpolate node bounding volumes according to the time of the ray. However, such hierarchies typically exhibit large overlap between bounding volumes, which results in an inefficient traversal. This work builds upon the concept of spatially partitioning nodes during tree construction in order to reduce overlap in the presence of moving objects. The resulting hierarchies are often significantly cheaper to traverse than those generated by classic approaches.
  • Item
    Voxelized Shadow Volumes
    (ACM, 2011) Wyman, Chris; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Efficient shadowing algorithms have been sought for decades, but most shadow research focuses on quickly identifying shadows on surfaces. This paper introduces a novel algorithm to efficiently sample light visibility at points inside a volume. These voxelized shadow volumes (VSVs) extend shadow maps to allow efficient, simultaneous queries of visibility along view rays, or can alternately be seen as a discretized shadow volume. We voxelize the scene intoa binary, epipolar-space grid where we apply a fast parallel scan to identify shadowed voxels. Using a view-dependent grid, our GPU implementation looks up 128 visibility samples along any eye ray with a single texture fetch. We demonstrate our algorithm in the context of interactive shadows in homogeneous, single-scattering participating media.
  • Item
    High-Performance Software Rasterization on GPUs
    (ACM, 2011) Laine, Samuli; Karras, Tero; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    In this paper, we implement an efficient, completely software-based graphics pipeline on a GPU. Unlike previous approaches, we obey ordering constraints imposed by current graphics APIs, guarantee hole-free rasterization, and support multisample antialiasing. Our goal is to examine the performance implications of not exploiting the fixed-function graphics pipeline, and to discern which additional hardware support would benefit software-based graphics themost. We present significant improvements over previous work in terms of scalability, performance, and capabilities. Our pipeline is malleable and easy to extend, and we demonstrate that in a wide variety of test cases its performance is within a factor of 2 8x compared to the hardware graphics pipeline on a top of the line GPU. Our implementation is open sourced and available at http://code.google.com/p/cudaraster/
  • Item
    SAH KD-Tree Construction on GPU
    (ACM, 2011) Wu, Zhefeng; Zhao, Fukai; Liu, Xinguo; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    KD-tree is one of the most efficient acceleration data structures for ray tracing. In this paper, we present a kd-tree construction algorithm that is precisely SAH-optimized and runs entirely on GPU. We construct the tree nodes in breadth-first order. In order to precisely evaluate the SAH cost, we design a parallel scheme based on the standard parallel scan primitive to count the triangle numbers for all split candidates, and a bucket-based algorithm to sort theAABBs (axis-aligned bounding box) of the clipped triangles of the child nodes. The proposed parallel algorithms can be mapped well to GPU s streaming architecture. The experiments showed that our algorithm can produce the highest quality kd-tree as the off-line CPU algorithms, but runs faster than multi-core CPU algorithms and the GPU SAH BVH-Tree algorithm.
  • Item
    Simpler and Faster HLBVH with Work Queues
    (ACM, 2011) Garanzha, Kirill; Pantaleoni, Jacopo; McAllister, David; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    A recently developed algorithm called Hierachical Linear Bounding Volume Hierarchies (HLBVH) has demonstrated the feasibility of reconstructing the spatial index needed forray tracing in real-time, even in the presence of millions of fully dynamic triangles. In this work we present a simpler and faster variant of HLBVH, where all the complex bookkeepingof pre x sums, compaction and partial breadth- rst tree traversal needed for spatial partitioning has been replaced with an elegant pipeline built on top of e cient work queues and binary search. The new algorithm is both faster and more memory e cient, removing the need for temporary storage of geometry data for intermediate computations. Finally, the same pipeline has been extended to parallelize the construction of the top-level SAH optimized tree on the GPU, eliminating round-trips to the CPU, accelerating the overall construction speed by a factor of 5 to 10x.
  • Item
    Rapid Simplifi cation of Multi-Attribute Meshes
    (ACM, 2011) Willmott, Andrew; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    We present a rapid simplification algorithm for meshes with multiple vertex attributes, targeted at rendering acceleration for realtime applications. Such meshes potentially feature normals, tangents, one or more texture coordinate sets, and animation information,such as blend weights and indices. Simplification algorithms in the literature typically focus on position-based meshes only, though extensions to handle surface attributes have been explored for those techniques based on iterative edge contraction. We show how to achieve the same goal for the faster class of algorithms based on vertex clustering, despite the comparative lack of connectivity information available. In particular, we show how tohandle attribute discontinuities, preserve thin features, and avoid animation-unfriendly contractions, all issues which prevent the base algorithm from being used in a production situation. Our application area is the generation of multiple levels of detail for player-created meshes at runtime, while the main game process continues to run. As such the robustness of the simplification algorithm employed is key; ours has been run successfully on manymillions of such models, with no preprocessing required. The algorithm is of application anywhere rapid mesh simplification of standard textured and animated models is desired.
  • Item
    Depth Buffer Compression for Stochastic Motion Blur Rasterization
    (ACM, 2011) Andersson, Magnus; Hasselgren, Jon; Akenine-Moeller, Tomas; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Previous depth buffer compression schemes are tuned for compressing depths values generated when rasterizing static triangles. They provide generous bandwidth usage savings, and are of great importance to graphics processors. However, stochastic rasterizationfor motion blur and depth of field is becoming a reality even for real-time graphics, and previous depth buffer compression algorithms fail to compress such buffers due to the irregularity of the positions and depths of the rendered samples. Therefore, we presenta new algorithm that targets compression of scenes rendered with stochastic motion blur rasterization. If possible, our algorithm fits a single time-dependent predictor function for all the samples in a tile. However, sometimes the depths are localized in more than onelayer, and we therefore apply a clustering algorithm to split the tile of samples into two layers. One time-dependent predictor function is then created per layer. The residuals between the predictor and the actual depths are then stored as delta corrections. For scenes with moderate motion, our algorithm can compress down to 65% compared to 75% for the previously best algorithm for stochastic buffers.
  • Item
    Adaptive Transparency
    (ACM, 2011) Salvi, Marco; Montgomery, Jefferson; Lefohn, Aaron; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Adaptive transparency is a new solution to order-independent transparency that closely approximates the ground-truth results obtained with A-buffer compositing but, like a Z-buffer, operates in bounded memory and exhibits consistent performance. The key contributionof our method is an adaptively compressed visibility representation that can be efficiently constructed and queried while rendering. The algorithm supports a wide range and combination of transparent geometry (e.g., foliage, windows, hair, and smoke). We demonstrate that adaptive transparency is five to forty times faster than realtimeA-buffer implementations, closely matches the image quality, and is both higher quality and faster than other approximate orderindependent transparency techniques: stochastic transparency, uniform opacity shadow maps, and Fourier opacity mapping.
  • Item
    An Inexpensive Bounding Representation for Offsets of Quadratic Curves
    (ACM, 2011) Ruf, Erik; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    We describe a simple mechanism for bounding the portion of the plane lying between a quadratic Beizer curve segment and its offset curve at distance d. Instead of comprising one or more partial bounding polygons, our representation consists of only a single approximate offset curve segment, also in quadratic Bezier form. Evaluated on a corpus of real-world curves, this technique avoids 68-99% of antialias-distance queries and 41-96% of brushparameter queries. A proof of correctness is provided.
  • Item
    Precision Selection for Energy-Effi cient Pixel Shaders
    (ACM, 2011) Pool, Jeff; Lastra, Anselmo; Singh, Montek; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    In this work, we seek to realize energy savings in modern pixel shaders by reducing the precision of their arithmetic. We explore three schemes for controlling this reduction. The first is a static analysis technique, which analyzes shader programs to choose precisionwith guaranteed error bounds. This approach may be too conservative in practice since it cannot take advantage of run-time information, so we also examine two methods that take the actual data values into account - a programmer-directed approach and a closed-loop error-tracking approach, both of which can lead to higher savings. To use this last method, we developed several heuristics to control how the precisions will change over time. Wesimulate several series of frames from commercial applications to evaluate the performance of these different schemes. The average savings found by the static and dynamic approaches are 31%, 70%, and 62% in the pixel shader s arithmetic, respectively, which could result in as much as a 10-20% savings of the GPU s energy as a whole.
  • Item
    Farthest-Point Optimized Point Sets with Maximized Minimum Distance
    (ACM, 2011) Schlömer, Thomas; Heck, Daniel; Deussen, Oliver; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Efficient sampling often relies on irregular point sets that uniformly cover the sample space. We present a flexible and simple optimization strategy for such point sets. It is based on the idea of increasing the mutual distances by successively moving each point to the farthestpoint, i.e., the location that has the maximum distance from the rest of the point set. We present two iterative algorithms based on this strategy. The first is our main algorithm which distributes points in the plane. Our experimental results show that the resulting distributions have almost optimal blue noise properties and are highly suitable for image plane sampling. The second is a variant of the main algorithm that partitions any point set into equally sizedsubsets, each with large mutual distances; the resulting partitionings yield improved results in more general integration problems such as those occurring in physically based rendering
  • Item
    Hierarchical Stochastic Motion Blur Rasterization
    (ACM, 2011) Munkberg, Jacob; Clarberg, Petrik; Hasselgren, Jon; Toth, Robert; Sugihara, Masamichi; Akenine-Moeller, Tomas; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    We present a hierarchical traversal algorithm for stochastic rasterization of motion blur, which efficiently reduces the number of inside tests needed to resolve spatio-temporal visibility. Our method is based on novel tile against moving primitive tests that also provide temporal bounds for the overlap. The algorithm works entirely in homogeneous coordinates, supports MSAA, facilitates efficient hierarchical spatio-temporal occlusion culling, and handles typical game workloads with widely varying triangle sizes. Furthermore, we use high-quality sampling patterns based on digital nets, and present a novel reordering that allows efficient proceduralgeneration with good anti-aliasing properties. Finally, we evaluate a set of hierarchical motion blur rasterization algorithms in terms of both depth buffer bandwidth, shading efficiency, and arithmetic complexity.
  • Item
    VoxelPipe: A Programmable Pipeline for 3D Voxelization
    (ACM, 2011) Pantaleoni, Jacopo; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    We present a highly exible and e cient software pipeline for programmable triangle voxelization. The pipeline, entirely written in CUDA, supports both fully conservative and thinvoxelizations, multiple boolean, oating point, vector-typed render targets, user-de ned vertex and fragment shaders, and a bucketing mode which can be used to generate 3D A-bu ers containing the entire list of fragments belonging to each voxel. For maximum e ciency, voxelization is implemented as a sort-middle tile-based rasterizer, while the A-bu er mode, essentially performing 3D binning of triangles over uniform grids, uses a sort-last pipeline. Despite its major exibility, the performance of our tile-based rasterizer is always competitive with and sometimes more than an order of magnitude superior to that of state-of-the-artbinary voxelizers, whereas our bucketing system is up to 4 times faster than previous implementations. In both cases the results have been achieved through the use of carefulload-balancing and high performance sorting primitives.
  • Item
    Lossless Compression of Already Compressed Textures
    (ACM, 2011) Ström, Jacob; Wennersten, Per; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    Texture compression helps rendering by reducing the footprint in graphics memory, thus allowing for more textures, and by lowering the number of memory accesses between the graphics processor and memory, increasing performance and lowering power consumption.Compared to image compression methods like JPEG however, textures codecs are typically much less efficient, which is a problem when downloading the texture over a network or reading it from disk. Therefore, in this paper we investigate lossless compression of already compressed textures. By predicting compression parameters in the image domain instead of in the parameter domain, a more efficient representation is obtained compared to using generalcompression such as ZIP or LZMA. This works well also for pixel indices that have previously proved hard to compress. A 4-bit-perpixel format can thus be compressed to around 2.3 bits per pixel (bpp), or 9.6% of the original size, compared to around 3.0 bpp when using ZIP or 2.8 bpp using LZMA. Compressing the original images with JPEG to the same quality also gives 2.3 bpp, meaning that texture compression followed by our packing is on par with JPEG in terms of compression efficiency.
  • Item
    Primitive Processing and Advanced Shading Architecture for Embedded Space
    (ACM, 2011) Kazakov, Max; Ohbuchi, Eisaku; Carsten Dachsbacher and William Mark and Jacopo Pantaleoni
    This paper presents a new graphics architecture enabling contentrich applications for the embedded space by extending hardware architecture in two main areas - geometry processing and configurable per-fragment shading. Our first contribution combines vertex cache and a programmable geometry engine that handles both fixed and variable size geometrical primitives completely on-chip. It enables subdivision surface tessellation, silhouette rendering and other geometry processing algorithms to be implemented in one pass and without external memory access. Our second contribution is in configurable per-fragment shading that is mainly a dot product + lookup table machine being versatile enough to realize Cook-Torrance shading, Schlick anisotropy model and others. Memory storage and memory bandwidth are reduced in proposed architecture as both compact geometry and material descriptions are possible, enabling complex shapes and sophisticated shading models in embedded space. The architecture has complete HDL and ASIC implementations and was demonstrated during the ESEC 2008 exhibition in Japan. Exposing all the features of our architecture via OpenGL ES 1.X and 2.0 API enabled extended OpenGL ES engines from Rightware Oy to run on our ASIC implementations.