Browsing by Author "Kerbl, Bernhard"
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Conservative Meshlet Bounds for Robust Culling of Skinned Meshes(The Eurographics Association and John Wiley & Sons Ltd., 2021) Unterguggenberger, Johannes; Kerbl, Bernhard; Pernsteiner, Jakob; Wimmer, Michael; Zhang, Fang-Lue and Eisemann, Elmar and Singh, KaranFollowing recent advances in GPU hardware development and newly introduced rendering pipeline extensions, the segmentation of input geometry into small geometry clusters-so-called meshlets-has emerged as an important practice for efficient rendering of complex 3D models. Meshlets can be processed efficiently using mesh shaders on modern graphics processing units, in order to achieve streamlined geometry processing in just two tightly coupled shader stages that allow for dynamic workload manipulation in-between. The additional granularity layer between entire models and individual triangles enables new opportunities for fine-grained visibility culling methods. However, in contrast to static models, view frustum and backface culling on a per-meshlet basis for skinned, animated models are difficult to achieve while respecting the conservative spatio-temporal bounds that are required for robust rendering results. In this paper, we describe a solution for computing and exploiting relevant conservative bounds for culling meshlets of models that are animated using linear blend skinning. By enabling visibility culling for animated meshlets, our approach can help to improve rendering performance and alleviate bottlenecks in the notoriously performanceand memory-intensive skeletal animation pipelines of modern real-time graphics applications.Item CUDA and Applications to Task-based Programming(The Eurographics Association, 2022) Kerbl, Bernhard; Kenzel, Michael; Winter, Martin; Steinberger, Markus; Hahmann, Stefanie; Patow, Gustavo A.Since its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general-purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular. For instance, we will provide detailed examples of use cases for independent thread scheduling, cooperative groups, and the CUDA standard library, libcu++, which are certain to become an integral part of clean coding for CUDA in the near future.Item CUDA and Applications to Task-based Programming(The Eurographics Association, 2021) Kenzel, Michael; Kerbl, Bernhard; Winter, Martin; Steinberger, Markus; O'Sullivan, Carol and Schmalstieg, DieterSince its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general-purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular. For instance, we will provide detailed examples of use cases for independent thread scheduling, cooperative groups, and the CUDA standard library, libcu++, which are certain to become an integral part of clean coding for CUDA in the near future. https://cuda-tutorial.github.io/Item Fast Multi-View Rendering for Real-Time Applications(The Eurographics Association, 2020) Unterguggenberger, Johannes; Kerbl, Bernhard; Steinberger, Markus; Schmalstieg, Dieter; Wimmer, Michael; Frey, Steffen and Huang, Jian and Sadlo, FilipEfficient rendering of multiple views can be a critical performance factor for real-time rendering applications. Generating more than one view multiplies the amount of rendered geometry, which can cause a huge performance impact. Minimizing that impact has been a target of previous research and GPU manufacturers, who have started to equip devices with dedicated acceleration units. However, vendor-specific acceleration is not the only option to increase multi-view rendering (MVR) performance. Available graphics API features, shader stages and optimizations can be exploited for improved MVR performance, while generally offering more versatile pipeline configurations, including the preservation of custom tessellation and geometry shaders. In this paper, we present an exhaustive evaluation of MVR pipelines available on modern GPUs. We provide a detailed analysis of previous techniques, hardware-accelerated MVR and propose a novel method, leading to the creation of an MVR catalogue. Our analyses cover three distinct applications to help gain clarity on overall MVR performance characteristics. Our interpretation of the observed results provides a guideline for selecting the most appropriate one for various use cases on different GPU architectures.Item GPU-Accelerated LOD Generation for Point Clouds(The Eurographics Association and John Wiley & Sons Ltd., 2023) Schütz, Markus; Kerbl, Bernhard; Klaus, Philip; Wimmer, Michael; Bikker, Jacco; Gribble, ChristiaanAbout: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art. Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time. Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 - an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory. Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.Item Hierarchical Bucket Queuing for Fine‐Grained Priority Scheduling on the GPU(© 2017 The Eurographics Association and John Wiley & Sons Ltd., 2017) Kerbl, Bernhard; Kenzel, Michael; Schmalstieg, Dieter; Seidel, Hans‐Peter; Steinberger, Markus; Chen, Min and Zhang, Hao (Richard)While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co‐processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine‐grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair‐scheduling, earliest‐deadline‐first scheduling and user‐defined dynamic priority scheduling. In a comparison with a sorting‐based approach, we reveal the advantages of hierarchical bucket queuing over previous work. Finally, we demonstrate the benefits of using priority scheduling in real‐world applications by example of path tracing and foveated micropolygon rendering.While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co‐processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine‐grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair‐scheduling, earliest‐deadline‐first scheduling and user‐defined dynamic priority scheduling.Item An Improved Triangle Encoding Scheme for Cached Tessellation(The Eurographics Association, 2022) Kerbl, Bernhard; Horváth, Linus; Cornel, Daniel; Wimmer, Michael; Pelechano, Nuria; Vanderhaeghe, DavidWith the recent advances in real-time rendering that were achieved by embracing software rasterization, the interest in alternative solutions for other fixed-function pipeline stages rises. In this paper, we revisit a recently presented software approach for cached tessellation, which compactly encodes and stores triangles in GPU memory. While the proposed technique is both efficient and versatile, we show that the original encoding is suboptimal and provide an alternative scheme that acts as a drop-in replacement. As shown in our evaluation, the proposed modifications can yield performance gains of 40% and more.Item Rendering Point Clouds with Compute Shaders and Vertex Order Optimization(The Eurographics Association and John Wiley & Sons Ltd., 2021) Schütz, Markus; Kerbl, Bernhard; Wimmer, Michael; Bousseau, Adrien and McGuire, MorganIn this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures. We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5x in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.Item The Road to Vulkan: Teaching Modern Low-Level APIs in Introductory Graphics Courses(The Eurographics Association, 2022) Unterguggenberger, Johannes; Kerbl, Bernhard; Wimmer, Michael; Bourdin, Jean-Jacques; Paquette, EricFor over two decades, the OpenGL API provided users with the means for implementing versatile, feature-rich, and portable real-time graphics applications. Consequently, it has been widely adopted by practitioners and educators alike and is deeply ingrained in many curricula that teach real-time graphics for higher education. Over the years, the architecture of graphics processing units (GPUs) incrementally diverged from OpenGL's conceptual design. The more recently introduced Vulkan API provides a more modern, fine-grained approach for interfacing with the GPU. Various properties of this API and overall trends suggest that Vulkan could soon replace OpenGL in many areas. Hence, it stands to reason that educators who have their students' best interests at heart should provide them with corresponding lecture material. However, Vulkan is notoriously verbose and rather challenging for first-time users, thus transitioning to this new API bears a considerable risk of failing to achieve expected teaching goals. In this paper, we document our experiences after teaching Vulkan in an introductory graphics course side-by-side with conventional OpenGL. A final survey enables us to draw conclusions about perceived workload, difficulty, and students' acceptance of either approach and identify suitable conditions and recommendations for teaching Vulkan to undergraduate students.Item Software Rasterization of 2 Billion Points in Real Time(ACM Association for Computing Machinery, 2022) Schütz, Markus; Kerbl, Bernhard; Wimmer, Michael; Josef Spjut; Marc Stamminger; Victor ZordanThe accelerated collection of detailed real-world 3D data in the form of ever-larger point clouds is sparking a demand for novel visualization techniques that are capable of rendering billions of point primitives in real-time. We propose a software rasterization pipeline for point clouds that is capable of rendering up to two billion points in real-time (60 FPS) on commodity hardware. Improvements over the state of the art are achieved by batching points, enabling a number of batch-level optimizations before rasterizing them within the same rendering pass. These optimizations include frustum culling, level-of-detail (LOD) rendering, and choosing the appropriate coordinate precision for a given batch of points directly within a compute workgroup. Adaptive coordinate precision, in conjunction with visibility buffers, reduces the required data for the majority of points to just four bytes, making our approach several times faster than the bandwidth-limited state of the art. Furthermore, support for LOD rendering makes our software rasterization approach suitable for rendering arbitrarily large point clouds, and to meet the elevated performance demands of virtual reality applications.Item Temporally Stable Content-Adaptive and Spatio-Temporal Shading Rate Assignment for Real-Time Applications(The Eurographics Association, 2021) Stappen, Stefan; Unterguggenberger, Johannes; Kerbl, Bernhard; Wimmer, Michael; Lee, Sung-Hee and Zollmann, Stefanie and Okabe, Makoto and Wünsche, BurkhardWe propose two novel methods to improve the efficiency and quality of real-time rendering applications: Texel differential-based content-adaptive shading (TDCAS) and spatio-temporally filtered adaptive shading (STeFAS). Utilizing Variable Rate Shading (VRS)-a hardware feature introduced with NVIDIA's Turing micro-architecture-and properties derived during rendering or Temporal Anti-Aliasing (TAA), our techniques adapt the resolution to improve the performance and quality of real-time applications. VRS enables different shading resolution for different regions of the screen during a single render pass. In contrast to other techniques, TDCAS and STeFAS have very little overhead for computing the shading rate. STeFAS enables up to 4x higher rendering resolutions for similar frame rates, or a performance increase of 4× at the same resolution.Item View-Dependent Impostors for Architectural Shape Grammars(The Eurographics Association, 2021) Jia, Chao; Roth, Moritz; Kerbl, Bernhard; Wimmer, Michael; Lee, Sung-Hee and Zollmann, Stefanie and Okabe, Makoto and Wünsche, BurkhardProcedural generation has become a key component in satisfying a growing demand for ever-larger, highly detailed geometry in realistic, open-world games and simulations. In this paper, we present our work towards a new level-of-detail mechanism for procedural geometry shape grammars. Our approach automatically identifies and adds suitable surrogate rules to a shape grammar's derivation tree. Opportunities for surrogates are detected in a dedicated pre-processing stage. Where suitable, textured impostors are then used for rendering based on the current viewpoint at runtime. Our proposed methods generate simplified geometry with superior visual quality to the state-of-the-art and roughly the same rendering performance.