High-Performance Graphics 2017

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/2631971

Browse

Now showing 1 - 18 of 18

Accelerated Single Ray Tracing for Wide Vector Units
(ACM, 2017) Fuetterling, Valentin; Lojewski, Carsten; Pfreundt, Franz-Josef; Hamann, Bernd; Ebert, Achim; Vlastimil Havran and Karthik Vaiyanathan
Utilizing the vector units of current processors for ray tracing single rays through Bounding Volume Hierarchies has been accomplished by increasing the branching factor of the acceleration structure to match the vector width. A high branching factor allows vectorized bounding box tests but requires a complex control flow for the calculation of a front-to-back traversal order. We propose a novel algorithm for single rays entirely based on vector operations that performs a complete traversal iteration in constant time, ideally suited for current and future micro architectures featuring wide vector units. In addition we use our single ray technique as a building block to construct a fast packet traversal for coherent rays. We validate our algorithms with implementations utilizing the AVX2 and AVX-512 instruction sets and demonstrate significant performance gains over state-of-the-art solutions.
Dual Streaming for Hardware-Accelerated Ray Tracing
(ACM, 2017) Shkurko, Konstantin; Grant, Tim; Kopta, Daniel; Mallett, Ian; Yuksel, Cem; Brunvand, Erik; Vlastimil Havran and Karthik Vaiyanathan
Hardware acceleration for ray tracing has been a topic of great interest in computer graphics. However, even with proposed custom hardware, the inherent irregularity in the memory access pattern of ray tracing has limited its performance, compared with rasterization on commercial GPUs. We provide a different approach to hardware-accelerated ray tracing, beginning with modifying the order of rendering operations, inspired by the streaming character of rasterization. Our dual streaming approach organizes the memory access of ray tracing into two predictable data streams. The predictability of these streams allows perfect prefetching and makes the memory access pattern an excellent match for the behavior of DRAM memory systems. By reformulating ray tracing as fully predictable streams of rays and of geometry we alleviate many long-standing problems of high-performance ray tracing and expose new opportunities for future research. Therefore, we also include extensive discussions of potential avenues for future research aimed at improving the performance of hardware-accelerated ray tracing using dual streaming.
Effective Static Bin Patterns for Sort-Middle Rendering
(ACM, 2017) Kerbl, Bernhard; Kenzel, Michael; Schmalstieg, Dieter; Steinberger, Markus; Vlastimil Havran and Karthik Vaiyanathan
To e ectively utilize an ever increasing number of processors during parallel rendering, hardware and so ware designers rely on sophisticated load balancing strategies. While dynamic load balancing is a powerful solution, it requires complex work distribution and synchronization mechanisms. Graphics hardware manufacturers have opted to employ static load balancing strategies instead. Speci cally, triangle data is distributed to processors based on its overlap with screenspace tiles arranged in a xed pa ern. While the current strategy of using simple pa erns for a small number of fast rasterizers achieves formidable performance, it is questionable how this approach will scale as the number of processors increases further. To address this issue, we analyze real-world rendering workloads, derive requirements for e ective pa erns, and present ten di erent pa ern design strategies based on these requirements. In addition to a theoretical evaluation of these design strategies, we compare the performance of select pa erns in a parallel sort-middle so ware rendering pipeline on an extensive set of triangle data captured from eight recent video games. As a result, we are able to identify a set of pa erns that scale well and exhibit signi cantly improved performance over na¨ıve approaches.
An Efficient Denoising Algorithm for Global Illumination
(ACM, 2017) Mara, Michael; McGuire, Morgan; Bitterli, Benedikt; Jarosz, Wojciech; Vlastimil Havran and Karthik Vaiyanathan
We propose a hybrid ray-tracing/rasterization strategy for real- time rendering enabled by a fast new denoising method. We factor global illumination into direct light at rasterized primary surfaces and two indirect lighting terms, each estimated with one path- traced sample per pixel. Our factorization enables efficient (biased) reconstruction by denoising light without blurring materials. We demonstrate denoising in under 10 ms per 1280 × 720 frame, compare results against the leading offline denoising methods, and include a supplement with source code, video, and data.
Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs
(ACM, 2017) Ylitie, Henri; Karras, Tero; Laine, Samuli; Vlastimil Havran and Karthik Vaiyanathan
We present a GPU-based ray traversal algorithm that operates on compressed wide BVHs and maintains the traversal stack in a compressed format. Our method reduces the amount of memory traffic significantly, which translates to 1.9-2.1 × improvement in incoherent ray traversal performance compared to the current state of the art. Furthermore, the memory consumption of our hierarchy is 35-60% of a typical uncompressed BVH. In addition, we present an algorithmically efficient method for converting a binary BVH into a wide BVH in a SAH-optimal fashion, and an improved method for ordering the child nodes at build time for the purposes of octant-aware fixed-order traversal.
Exploiting Budan-Fourier and Vincent's Theorems for Ray Tracing 3D Bézier Curves
(ACM, 2017) Reshetov, Alexander; Vlastimil Havran and Karthik Vaiyanathan
We present a new approach to finding ray-cubic Bézier curve intersections by leveraging recent achievements in polynomial studies. Compared with the state-of-the-art adaptive linearization, it increases performance by 5-50 times, while also improving the accuracy by 1000X. Our algorithm quickly eliminates parts of the curve for which the distance to the given ray is guaranteed to be bigger than a model-specific threshold (maximum curve's half-width). We then reduce the interval with the isolated distance minimum even further and apply a single iteration of a non-linear root-finding technique (Ridders' method).
Extended Morton Codes for High Performance Bounding Volume Hierarchy Construction
(ACM, 2017) Vinkler, Marek; Bittner, Jiří; Havran, Vlastimil; Vlastimil Havran and Karthik Vaiyanathan
We propose an extension to the Morton codes used for spatial sorting of scene primitives. e extended Morton codes increase the coherency of the clusters resulting from the object sorting and work be er for non-uniform distribution of scene primitives. In particular, our codes are enhanced by encoding the size of the objects, applying adaptive ordering of the code bits, and using variable bit counts for di erent dimensions. We use these codes for constructing Bounding Volume Hierarchies (BVH) and show that the extended Morton code leads to higher quality BVH, particularly for the fastest available BVH build algorithms that heavily rely on spatial coherence of Morton code sorting. In turn, our method allows to achieve up to 54% improvement in the BVH quality especially for scenes with a non-uniform spatial extent and varying object sizes. Our method is easy to implement into any Morton code based build algorithm as it involves only a modi cation of the Morton code computation step.
Fast Maximal Poisson-Disk Sampling by Randomized Tiling
(ACM, 2017) Wang, Tong; Suda, Reiji; Vlastimil Havran and Karthik Vaiyanathan
It is generally accepted that Poisson disk sampling provides great properties in various applications in computer graphics. We present KD-tree based randomized tiling (KDRT), an e cient method to generate maximal Poisson-disk samples by replicating and conquering tiles clipped from a pa ern of very small size. Our method is a twostep process: rst, randomly clipping tiles from an MPS(Maximal Poisson-disk Sample) pa ern, and second, conquering these tiles together to form the whole sample plane. e results showed that this method can e ciently generate maximal Poisson-disk samples with very small trade-o in bias error. ere are two main contributions of this paper: First, a fast and robust Poisson-disk sample generation method is presented; Second, this method can be used to combine several groups of independently generated sample pa erns to form a larger one, thus can be applied as a general parallelization scheme of any MPS methods.
A Hardware-Friendly Bilateral Solver Accelerator for Real-Time Virtual Reality Video
(ACM, 2017) Mazumdar, Amrita; Alaghi, Armin; Barron, Jonathan T.; Gallup, David; Ceze, Luis; Oskin, Mark; Seitz, Steven M.; Vlastimil Havran and Karthik Vaiyanathan
Rendering 3D-360° VR video from a camera rig is computationintensive and typically performed o ine. In this paper, we target the most time-consuming step of the VR video creation process, high-quality ow estimation with the bilateral solver.We propose a new algorithm, the hardware-friendly bilateral solver, that enables faster runtimes than existing algorithms of similar quality. Our algorithm is easily parallelized, achieving a 4 speedup on CPU and 32 speedup on GPU over a baseline CPU implementation. We also design an FPGA-based hardware accelerator that utilizes reduced-precision computation and the parallelism inherent in our algorithm to achieve further speedups over our CPU and GPU implementations while consuming an order of magnitude less power. e FPGA design's power e ciency enables practical real-time VR video processing at the camera rig or in the cloud.
Hierarchical Multi-Layer Screen-Space Ray Tracing
(ACM, 2017) Hofmann, Nikolai; Bogendörfer, Phillip; Stamminger, Marc; Selgrad, Kai; Vlastimil Havran and Karthik Vaiyanathan
In this paper we present a method for fast screen-space ray tracing. Single-layer screen-space ray marching is an established tool in high-performance applications, such as games, where plausible and appealing results are more important than strictly correct ones. However, even in such tightly controlled environments, missing scene information can cause visible artifacts. is can be tackled by keeping multiple layers of screen-space information, but might not be a orable on severely limited time-budgets. Traversal speed of single-layer ray marching is commonly improved by multi-resolution schemes, from sub-sampling to stepping through mip-maps to achieve faster frame rates. We show that by combining these approaches, keeping multiple layers and tracing on multiple resolutions, images of higher quality can be computed rapidly. Figure 1 shows this for two scenes with multi-bounce re ections that would show strong artifacts when using only a single layer.
Improved Two-Level BVHs Using Partial Re-Braiding
(ACM, 2017) Benthin, Carsten; Woop, Sven; Afra, Attila T.; Wald, Ingo; Vlastimil Havran and Karthik Vaiyanathan
We propose a novel approach for improving the quality of two-level BVHs (i.e., a two-level data structure that uses a top-level BVH built over second-level object BVHs). After building an individual, high-quality BVH for each object, our new top-level BVH build approach selectively re-braids (opens and merges) object BVHs during the build process to reduce overlap and improve SAH quality. We demonstrate that compared to the two main state-of-the-art techniques-brute-force re-construction of a single, flat BVH; and building a traditional two-level BVH over objects, respectively-the proposed approach achieves build times significantly faster than the former, while simultaneously yielding traversal performance that is much higher than the latter.
Interactive Stable Ray Tracing
(ACM, 2017) Corso, Alessandro Dal; Salvi, Marco; Kolb, Craig; Frisvad, Jeppe Revall; Lefohn, Aaron; Luebke, David; Vlastimil Havran and Karthik Vaiyanathan
Interactive ray tracing applications running on commodity hard- ware can su er from objectionable temporal artifacts due to a low sample count. We introduce stable ray tracing, a technique that improves temporal stability without the over-blurring and ghosting artifacts typical of temporal post-processing lters. Our technique is based on sample reprojection and explicit hole lling, rather than relying on hole- lling heuristics that can compromise image quality. We make reprojection practical in an interactive ray tracing context through the use of a super-resolution bitmask to estimate screen space sample density. We show signi cantly improved temporal stability as compared with supersampling and an existing reprojec- tion techniques. We also investigate the performance and image quality di erences between our technique and temporal antialias- ing, which typically incurs a signi cant amount of blur. Finally, we demonstrate the bene ts of stable ray tracing by combining it with progressive path tracing of indirect illumination.
Mesh Color Textures
(ACM, 2017) Yuksel, Cem; Vlastimil Havran and Karthik Vaiyanathan
The fundamental limitations of texture mapping has been a long standing problem in computer graphics. The cost of defining and maintaining texture coordinates and the seams that introduce various filtering inconsistencies lead some graphics applications to adapt alternative techniques that directly address these problems, such as mesh colors. However, alternatives to texture mapping introduce run-time cost that contradicts with the performance constraints of real-time graphics applications. In this paper we introduce mesh color textures that offer all benefits of mesh colors to real-time graphics applications with strict performance constraints. Mesh color textures convert the mesh color data to a format that can be efficiently used by the texture filtering hardware on current GPUs. Utilizing a novel 4D texture coordinate formulation, mesh color textures can provide correct filtering for all mipmap levels and eliminate artifacts due to seams. We show that mesh color textures introduce negligible run-time cost with no discontinuity in texture filtering. We also discuss potential future modifications to graphics hardware and API that would further simplify the use of mesh color textures in real-time graphics applications.
Non-Linearly Quantized Moment Shadow Maps
(ACM, 2017) Peters, Christoph; Vlastimil Havran and Karthik Vaiyanathan
Moment shadow maps enable direct filtering to accomplish proper antialiasing of dynamic hard shadows. For each texel, the moment shadow map stores four powers of the depth in either 64 or 128 bits. After filtering, this information enables a heuristic reconstruction. However, the rounding errors introduced at 64 bits per texel necessitate a bias that strengthens light leaking artifacts noticeably. In this paper, we propose a non-linear transform which maps the four moments to four quantities describing the depth distribution more directly. These quantities can then be quantized to a total of 32 or 64 bits. At 64 bits, the results are virtually indistinguishable from moment shadow mapping at 128 bits per texel. Even at 32 bits, there is hardly any additional light leaking but banding artifacts may occur. At the same time, the computational overhead for the reconstruction is reduced. As a prerequisite for the use of these quantization schemes, we propose a compute shader that applies a resolve for a multisampled shadow map and a 92 two-pass Gaussian filter in shared memory. The quantized moments are written back to device memory only once at the very end. This approach makes our technique roughly as fast as variance shadow mapping with 32 bits per texel. Since hardware-accelerated bilinear filtering is incompatible with non-linear quantization, we employ blue noise dithering as inexpensive alternative to manual bilinear filtering.
Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path Traced Global Illumination
(ACM, 2017) Schied, Christoph; Kaplanyan, Anton; Wyman, Chris; Patney, Anjul; Chaitanya, Chakravarty Reddy Alla; Burgess, John; Liu, Shiqiu; Dachsbacher, Carsten; Lefohn, Aaron; Salvi, Marco; Vlastimil Havran and Karthik Vaiyanathan
We introduce a reconstruction algorithm that generates a tempo- rally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accu- mulation to increase the e ective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter [Dammertz et al.2010]. This hierarchy allows us to distinguish between noise and detail at multiple scales using local luminance variance. Physically based light transport is a long-standing goal for real- time computer graphics. While modern games use limited forms of ray tracing, physically based Monte Carlo global illumination does not meet their30 Hzminimal performance requirement. Looking ahead to fully dynamic real-time path tracing, we expect this to only be feasible using a small number of paths per pixel. As such, image reconstruction using low sample counts is key to bringing path tracing to real-time. When compared to prior interactive reconstruction lters, our work gives approximately 10×more temporally stable results, matches reference images 5-47% be er (according to SSIM), and runs in just10 ms(±15%) on modern graphics hardware at 1920×1080 resolution.
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
(ACM, 2017) Woop, Sven; Afra, Attila T.; Benthin, Carsten; Vlastimil Havran and Karthik Vaiyanathan
We present the STBVH, a newapproach for rendering multi-segment motion blur using a bounding volume hierarchy (BVH) that stores both spatial linearly interpolated bounds and temporal bounds. The approach is designed for different number of time steps per mesh or object. While separating the individual meshes using standard partitioning techniques, it performs temporal splits for locations with large or curved motion inside the meshes. Our approach uses a modified motion blur surface area heuristic (MBSAH) that calculates probabilities in the presence of spatial-temporal bounds and works on linear motion segments of primitives rather than on full motion curves. We show that our approach is able to handle challenging scenes with varying degrees of motion blur per mesh, using significantly less memory and having competitive rendering performance compared to building separate linear motion blur BVHs per global time segment.
Timeline Scheduling for Out-of-Core Ray Batching
(ACM, 2017) Son, Myungbae; Yoon, Sung- Eui; Vlastimil Havran and Karthik Vaiyanathan
We present a timeline based scheduling method for Monte Carlo ray tracing of out-of-core models on distributed memory clusters. We abstract different setups of various compute and memory devices into a graph-based representation, and estimate the time for job execution and data transfer in a simple timing model. Our scheduler allocates not only jobs to processors, but also data transfers to memory channels. This approach allows us to control the I/O overload, which is the principal bottleneck in rendering massivescale scenes. To manage dependencies of data transfers and data intensive jobs, each job and data transfer is arranged on the timeline with dependency relations. Based on this model, our scheduler aims to increase data locality by allocating a job that takes the least time to fetch required data on a given compute device. This goal is achieved by optimizing the data transfer path to maximize latency hiding effects. We have implemented a path tracer on our framework and tested massive models up to 500Mtriangles. Compared to prior state-of-the-art scheduling techniques, our renderer achieved higher horizontal scalability on flexible device configurations.
Vectorized Production Path Tracing
(ACM, 2017) Lee, Mark; Green, Brian; Xie, Feng; Tabellion, Eric; Vlastimil Havran and Karthik Vaiyanathan
This paper presents MoonRay, a high performance production rendering architecture using Monte Carlo path tracing developed at DreamWorks Animation. MoonRay is the first production path tracer, to our knowledge, designed to fully leverage Single Instruction/ Multiple Data (SIMD) vector units throughout. To achieve high SIMD efficiency, we employ Embree for tracing rays and vectorize the remaining compute intensive components of the renderer: the integrator, the shading system and shaders, and the texturing engine. Queuing is used to help keep all vector lanes full and improve data coherency.We use the ISPC programming language [Intel 2011; Pharr and Mark 2012] to achieve improved performance across SSE, AVX/AVX2 and AVX512 instruction sets. Our system includes two functionally equivalent uni-directional CPU path tracing implementations: a C++ scalar depth-first version and an ISPC vectorized breadth-first wavefront version. Using side by side performance comparisons on complex production scenes and assets we show our vectorized architecture, running on AVX2, delivers between a 1.3× to 2.3× speed-up in overall render time, and up to 3×, 6×, and 4×, speed-ups within the integration, shading, and texturing components, respectively.

Browse

Browsing High-Performance Graphics 2017 by Title

Results Per Page

Sort Options