High-Performance Graphics 2020

Permanent URI for this collection

https://diglib7.eg.org/handle/10.2312/2632947

July 13-16, hosted online; published as special issue in Proceedings of the ACM on Computer Graphics and Interactive Techniques, Vol. 3, No. 2

High-Performance Rendering

Generalized Light Portals

[full paper

] [meta data

]

Shinji Ogaki

Efficient Adaptive Deferred Shading with Hardware Scatter Tiles

[full paper

] [meta data

]

Ian Mallett, Cem Yuksel, and Larry Seiler

Post-RenderWarp with Late Input Sampling Improves Aiming Under High Latency Conditions

[full paper

] [meta data

]

Joohwan Kim, Pyarelal Knowles, Josef Spjut, Ben Boudaoud, and Morgan Mcguire

Image-Based Computing

Neural Denoising for Path Tracing of Medical Volumetric Data

[full paper

] [meta data

]

Nikolai Hofmann, Jana Martschinke, Klaus Engel, and Marc Stamminger

High-Performance Image Filters via Sparse Approximations

[full paper

] [meta data

]

Kersten Schuster, Philip Trettner, and Leif Kobbelt

FLIP: A Difference Evaluator for Alternating Images

[full paper

] [meta data

]

Pontus Andersson, Jim Nilsson, Tomas Akenine-Möller, Magnus Oskarsson, Kalle Åström, and Mark D. Fairchild

Rendering Thin or Transparent Objects

Quadratic Approximation of Cubic Curves

[full paper

] [meta data

]

Nghia Truong, Cem Yuksel, and Larry Seiler

Using Hardware Ray Transforms to Accelerate Ray/Primitive Intersections for Long, Thin Primitive Types

[full paper

] [meta data

]

Ingo Wald, Nate Morrical, Stefan Zellmann, Lei Ma, Will Usher, Tiejun Huang, and Valerio Pascucci

Sub-triangle opacity masks for faster ray tracing of transparent objects

[full paper

] [meta data

]

Holger Gruen, Carsten Benthin, and Sven Woop

Hardware Architectures and Space Partitioning

Compacted CPU/GPU Data Compression via Modified Virtual Address Translation

[full paper

] [meta data

]

Larry Seiler, Daqi Lin, and Cem Yuksel

Hardware-Accelerated Dual-Split Trees

[full paper

] [meta data

]

Daqi Lin, Elena Vasiou, Cem Yuksel, Daniel Kopta, and Erik Brunvand

Concurrent Binary Trees (with application to longest edge bisection)

[full paper

] [meta data

]

Jonathan Dupuy

BibTeX (High-Performance Graphics 2020)

@inproceedings{10.1145:3406176,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Generalized Light Portals}},

author = {Ogaki, Shinji
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406176}

}

@inproceedings{10.1145:3406184,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Efficient Adaptive Deferred Shading with Hardware Scatter Tiles}},

author = {Mallett, Ian and 
Yuksel, Cem and 
Seiler, Larry
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406184}

}

@inproceedings{10.1145:3406181,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Neural Denoising for Path Tracing of Medical Volumetric Data}},

author = {Hofmann, Nikolai and 
Martschinke, Jana and 
Engel, Klaus and 
Stamminger, Marc
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406181}

}

@inproceedings{10.1145:3406187,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Post-RenderWarp with Late Input Sampling Improves Aiming Under High Latency Conditions}},

author = {Kim, Joohwan and 
Knowles, Pyarelal and 
Spjut, Josef and 
Boudaoud, Ben and 
Mcguire, Morgan
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406187}

}

@inproceedings{10.1145:3406182,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{High-Performance Image Filters via Sparse Approximations}},

author = {Schuster, Kersten and 
Trettner, Philip and 
Kobbelt, Leif
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406182}

}

@inproceedings{10.1145:3406178,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Quadratic Approximation of Cubic Curves}},

author = {Truong, Nghia and 
Yuksel, Cem and 
Seiler, Larry
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406178}

}

@inproceedings{10.1145:3406183,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{FLIP: A Difference Evaluator for Alternating Images}},

author = {Andersson, Pontus and 
Nilsson, Jim and 
Akenine-Möller, Tomas and 
Oskarsson, Magnus and 
Åström, Kalle and 
Fairchild, Mark D.
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406183}

}

@inproceedings{10.1145:3406177,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Compacted CPU/GPU Data Compression via Modified Virtual Address Translation}},

author = {Seiler, Larry and 
Lin, Daqi and 
Yuksel, Cem
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406177}

}

@inproceedings{10.1145:3406179,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Using Hardware Ray Transforms to Accelerate Ray/Primitive Intersections for Long, Thin Primitive Types}},

author = {Wald, Ingo and 
Morrical, Nate and 
Zellmann, Stefan and 
Ma, Lei and 
Usher, Will and 
Huang, Tiejun and 
Pascucci, Valerio
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406179}

}

@inproceedings{10.1145:3406180,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Sub-triangle opacity masks for faster ray tracing of transparent objects}},

author = {Gruen, Holger and 
Benthin, Carsten and 
Woop, Sven
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406180}

}

@inproceedings{10.1145:3406185,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Hardware-Accelerated Dual-Split Trees}},

author = {Lin, Daqi and 
Vasiou, Elena and 
Yuksel, Cem and 
Kopta, Daniel and 
Brunvand, Erik
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406185}

}

@inproceedings{10.1145:3406186,

booktitle = {Proceedings of the ACM on Computer Graphics and Interactive Techniques},

editor = {Yuksel, Cem and Membarth, Richard and Zordan, Victor
},
title = {{Concurrent Binary Trees (with application to longest edge bisection)}},

author = {Dupuy, Jonathan
},
year = {2020},

publisher = {ACM},

ISSN = {2577-6193},

DOI = {10.1145/3406186}

}

Browse

Now showing 1 - 12 of 12

Generalized Light Portals
(ACM, 2020) Ogaki, Shinji; Yuksel, Cem and Membarth, Richard and Zordan, Victor
Light portals are useful for accelerating the convergence of Monte Carlo path tracing when rendering interiors. However, they are generally limited to flat polygonal shapes. In this paper, we introduce a new concept that allows existing polygon meshes with arbitrary shaders in a scene to be used as generalized light portals. We also present an efficient sampling method that takes into account the pixel values of the environment map and ray guiding two-dimensional textures that are typically opacity or transparency maps. This novel sampling strategy can be combined with other sampling techniques by using multiple importance sampling.
Efficient Adaptive Deferred Shading with Hardware Scatter Tiles
(ACM, 2020) Mallett, Ian; Yuksel, Cem; Seiler, Larry; Yuksel, Cem and Membarth, Richard and Zordan, Victor
Adaptive shading is an effective mechanism for reducing the number of shaded pixels to a subset of the image resolution with minimal impact on final rendering quality. We present a new scheduling method based on on-chip tiles that, along with relatively minor modifications to the GPU architecture, provides efficient hardware support. As compared to software implementations on current hardware using compute shaders, our approach dramatically reduces memory bandwidth requirements, thereby significantly improving performance and energy use. We also introduce the concept of a fragment pre-shader for programmatically controlling when a fragment shader is invoked, and describe advanced techniques for utilizing our approach to further reduce the number of shaded pixels via temporal filtering, or to adjust rendering quality to maintain stable framerates.
Neural Denoising for Path Tracing of Medical Volumetric Data
(ACM, 2020) Hofmann, Nikolai; Martschinke, Jana; Engel, Klaus; Stamminger, Marc; Yuksel, Cem and Membarth, Richard and Zordan, Victor
In this paper, we transfer machine learning techniques previously applied to denoising surface-only Monte Carlo renderings to path-traced visualizations of medical volumetric data. In the domain of medical imaging, path-traced videos turned out to be an efficient means to visualize and understand internal structures, in particular for less experienced viewers such as students or patients. However, the computational demands for the rendering of high-quality path-traced videos are very high due to the large number of samples necessary for each pixel. To accelerate the process, we present a learning-based technique for denoising path-traced videos of volumetric data by increasing the sample count per pixel; both through spatial (integrating neighboring samples) and temporal filtering (reusing samples over time). Our approach uses a set of additional features and a loss function both specifically designed for the volumetric case. Furthermore, we present a novel network architecture tailored for our purpose, and introduce reprojection of samples to improve temporal stability and reuse samples over frames. As a result, we achieve good image quality even from severely undersampled input images, as visible in the teaser image.
Post-RenderWarp with Late Input Sampling Improves Aiming Under High Latency Conditions
(ACM, 2020) Kim, Joohwan; Knowles, Pyarelal; Spjut, Josef; Boudaoud, Ben; Mcguire, Morgan; Yuksel, Cem and Membarth, Richard and Zordan, Victor
End-to-end latency in remote-rendering systems can reduce user task performance. This notably includes aiming tasks on game streaming services, which are presently below the standards of competitive first-person desktop gaming.We evaluate the latency-induced penalty on task completion time in a controlled environment and show that it can be significantly mitigated by adopting and modifying image and simulation-warping techniques from virtual reality, eliminating up to 80% of the penalty from 80 ms of added latency. This has potential to enable remote rendering for esports and increase the effectiveness of remote-rendered content creation and robotic teleoperation. We provide full experimental methodology, analysis, implementation details, and source code.
High-Performance Image Filters via Sparse Approximations
(ACM, 2020) Schuster, Kersten; Trettner, Philip; Kobbelt, Leif; Yuksel, Cem and Membarth, Richard and Zordan, Victor
We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach,we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.
Quadratic Approximation of Cubic Curves
(ACM, 2020) Truong, Nghia; Yuksel, Cem; Seiler, Larry; Yuksel, Cem and Membarth, Richard and Zordan, Victor
We present a simple degree reduction technique for piecewise cubic polynomial splines, converting them into piecewise quadratic splines that maintain the parameterization and C1 continuity. Our method forms identical tangent directions at the interpolated data points of the piecewise cubic spline by replacing each cubic piece with a pair of quadratic pieces. The resulting representation can lead to substantial performance improvements for rendering geometrically complex spline models like hair and fiber-level cloth. Such models are typically represented using cubic splines that are C1-continuous, a property that is preserved with our degree reduction. Therefore, our method can also be considered a new quadratic curve construction approach for high-performance rendering. We prove that it is possible to construct a pair of quadratic curves with C1 continuity that passes through any desired point on the input cubic curve. Moreover, we prove that when the pair of quadratic pieces corresponding to a cubic piece have equal parametric lengths, they join exactly at the parametric center of the cubic piece, and the deviation in positions due to degree reduction is minimized.
FLIP: A Difference Evaluator for Alternating Images
(ACM, 2020) Andersson, Pontus; Nilsson, Jim; Akenine-Möller, Tomas; Oskarsson, Magnus; Åström, Kalle; Fairchild, Mark D.; Yuksel, Cem and Membarth, Richard and Zordan, Victor
Image quality measures are becoming increasingly important in the field of computer graphics. For example, there is currently a major focus on generating photorealistic images in real time by combining path tracing with denoising, for which such quality assessment is integral. We present FLIP, which is a difference evaluator with a particular focus on the differences between rendered images and corresponding ground truths. Our algorithm produces a map that approximates the difference perceived by humans when alternating between two images. FLIP is a combination of modified existing building blocks, and the net result is surprisingly powerful. We have compared our work against a wide range of existing image difference algorithms and we have visually inspected over a thousand image pairs that were either retrieved from image databases or generated in-house. We also present results of a user study which indicate that our method performs substantially better, on average, than the other algorithms. To facilitate the use of FLIP, we provide source code in C++, MATLAB, NumPy/SciPy, and PyTorch.
Compacted CPU/GPU Data Compression via Modified Virtual Address Translation
(ACM, 2020) Seiler, Larry; Lin, Daqi; Yuksel, Cem; Yuksel, Cem and Membarth, Richard and Zordan, Victor
We propose a method to reduce the footprint of compressed data by using modified virtual address translation to permit random access to the data. This extends our prior work on using page translation to perform automatic decompression and deswizzling upon accesses to fixed rate lossy or lossless compressed data. Our compaction method allows a virtual address space the size of the uncompressed data to be used to efficiently access variable-size blocks of compressed data. Compression and decompression take place between the first and second level caches, which allows fast access to uncompressed data in the first level cache and provides data compaction at all other levels of the memory hierarchy. This improves performance and reduces power relative to compressed but uncompacted data. An important property of our method is that compression, decompression, and reallocation are automatically managed by the new hardware without operating system intervention and without storing compression data in the page tables. As a result, although some changes are required in the page manager, it does not need to know the specific compression algorithm and can use a single memory allocation unit size. We tested our method with two sample CPU algorithms. When performing depth buffer occlusion tests, our method reduces the memory footprint by 3.1x. When rendering into textures, our method reduces the footprint by 1.69x before rendering and 1.63x after. In both cases, the power and cycle time are better than for uncompacted compressed data, and significantly better than for accessing uncompressed data.
Using Hardware Ray Transforms to Accelerate Ray/Primitive Intersections for Long, Thin Primitive Types
(ACM, 2020) Wald, Ingo; Morrical, Nate; Zellmann, Stefan; Ma, Lei; Usher, Will; Huang, Tiejun; Pascucci, Valerio; Yuksel, Cem and Membarth, Richard and Zordan, Victor
With the recent addition of hardware ray tracing capabilities, GPUs have become incredibly efficient at ray tracing both triangular geometry, and instances thereof. However, the bounding volume hierarchies that current ray tracing hardware relies on are known to struggle with long, thin primitives like cylinders and curves, because the axis-aligned bounding boxes that these hierarchies rely on cannot tightly bound such primitives. In this paper, we evaluate the use of RTX ray tracing capabilities to accelerate these primitives by tricking the GPU's instancing units into executing a hardware-accelerated oriented bounding box (OBB) rejection test before calling the user's intersection program. We show that this can be done with minimal changes to the intersection programs and demonstrate speedups of up to 5.9× on a variety of data sets.
Sub-triangle opacity masks for faster ray tracing of transparent objects
(ACM, 2020) Gruen, Holger; Benthin, Carsten; Woop, Sven; Yuksel, Cem and Membarth, Richard and Zordan, Victor
We propose an easy and simple-to-integrate approach to accelerate ray tracing of alpha-tested transparent geometry with a focus on Microsoft® DirectX® or Vulkan® ray tracing extensions. Pre-computed bit masks are used to quickly determine fully transparent and fully opaque regions of triangles thereby skipping the more expensive alpha-test operation. These bit masks allow us to skip up to 86% of all transparency tests, yielding up to 40% speed up in a proof-of-concept DirectX® software only implementation.
Hardware-Accelerated Dual-Split Trees
(ACM, 2020) Lin, Daqi; Vasiou, Elena; Yuksel, Cem; Kopta, Daniel; Brunvand, Erik; Yuksel, Cem and Membarth, Richard and Zordan, Victor
Bounding volume hierarchies (BVH) are the most widely used acceleration structures for ray tracing due to their high construction and traversal performance. However, the bounding planes shared between parent and children bounding boxes is an inherent storage redundancy that limits further improvement in performance due to the memory cost of reading these redundant planes. Dual-split trees can create identical space partitioning as BVHs, but in a compact form using less memory by eliminating the redundancies of the BVH structure representation. This reduction in memory storage and data movement translates to faster ray traversal and better energy efficiency. Yet, the performance benefits of dual-split trees are undermined by the processing required to extract the necessary information from their compact representation. This involves bit manipulations and branching instructions which are inefficient in software. We introduce hardware acceleration for dual-split trees and show that the performance advantages over BVHs are emphasized in a hardware ray tracing context that can take advantage of such acceleration.We provide details on how the operations needed for decoding dual-split tree nodes can be implemented in hardware and present experiments in a number of scenes with different sizes using path tracing. In our experiments, we have observed up to 31% reduction in render time and 38% energy saving using dual-split trees as compared to binary BVHs representing identical space partitioning.
Concurrent Binary Trees (with application to longest edge bisection)
(ACM, 2020) Dupuy, Jonathan; Yuksel, Cem and Membarth, Richard and Zordan, Victor
We introduce the concurrent binary tree (CBT), a novel concurrent representation to build and update arbitrary binary trees in parallel. Fundamentally, our representation consists of a binary heap, i.e., a 1D array, that explicitly stores the sum-reduction tree of a bitfield. In this bitfield, each one-valued bit represents a leaf node of the binary tree encoded by the CBT, which we locate algorithmically using a binary-search over the sum-reduction. We show that this construction allows to dispatch down to one thread per leaf node and that, in turn, these threads can safely split and/or remove nodes concurrently via simple bitwise operations over the bitfield. The practical benefit of CBTs lies in their ability to accelerate binary-tree-based algorithms with parallel processors. To support this claim, we leverage our representation to accelerate a longest-edgebisection- based algorithm that computes and renders adaptive geometry for large-scale terrains entirely on the GPU. For this specific algorithm, the CBT accelerates processing speed linearly with the number of processors.

BibTeX (High-Performance Graphics 2020)

Browse

Recent Submissions

Results Per Page

Sort Options