EGGH06: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006
Permanent URI for this collection
Browse
Browsing EGGH06: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006 by Issue Date
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item High Quality Normal Map Compression(The Eurographics Association, 2006) Munkberg, Jacob; Akenine-Möller, Tomas; Ström, Jacob; Marc Olano and Philipp SlusallekNormal mapping is a widely used technique in real-time graphics, but so far little research has focused on compressing normal maps. Therefore, we present several simple techniques that improve the quality of ATI s 3Dc normal map compression algorithm. We use varying point distributions, rotation, and differential encoding. On average, this improves the peak-signal-to-noise-ratio by 3 dB, which is clearly visible in rendered images. Our algorithm also allows us to better handle slowly varying normals, which often occurs in real-world normal maps. We also describe the decoding process in detail.Item Minimum Triangle Separation for Correct Z-Buffer Occlusion(The Eurographics Association, 2006) Akeley, Kurt; Su, Jonathan; Marc Olano and Philipp SlusallekWe show that, and how, window coordinate precision (the representations of xwin and ywin), field of view, and error accumulated by single-precision mapping arithmetic contribute to, and sometimes dominate, effective zbuffer resolution. Our results are developed analytically, then verified through simulation. Using our approach system designers can allocate numeric precision more efficiently, and programmers can more confidently predict the minimum triangle-to-triangle separation required to ensure correct z-buffer occlusion.Item Pseudorandom Number Generation on the GPU(The Eurographics Association, 2006) Sussman, Myles; Crutchfield, William; Papakipos, Matthew; Marc Olano and Philipp SlusallekStatistical algorithms such as Monte Carlo integration are good candidates to run on graphics processing units. The heart of these algorithms is random number generation, which generally has been done on the CPU. In this paper we present GPU implementations of three random number generators.We show how to overcome limitations of GPU hardware that affect the feasibility and efficiency of employing a GPU-based RNG. We also present a data flow model for managing and updating substream state for each of the parallel substreams of random numbers. We show that GPU random number generators will greatly benefit from having more outputs from each thread. We discuss other hardware modifications that will be beneficial to the implementation of GPU-RNG, and we present performance measurements of our implementations.Item Non-interleaved Deferred Shading of Interleaved Sample Patterns(The Eurographics Association, 2006) Segovia, Benjamin; Iehl, Jean Claude; Mitanchey, Richard; Péroche, Bernard; Marc Olano and Philipp SlusallekThis paper presents a novel and fast technique to combine interleaved sampling and deferred shading on a GPU. The core idea of this paper is quite simple. Interleaved sample patterns are computed in a non-interleaved deferred shading process. The geometric buffer (G-buffer) which contains all of the pixel information is actually split into several separate and distinct sub-buffers. To achieve such a result in a fast way, a massive two-pass swizzling copy is used to convert between these two buffer organizations. Once split, the sub-buffers can then be accessed to perform any fragment operation as it is done with a standard deferred shading rendering pipeline. By combining interleaved sampling and deferred shading, real time rendering of global illumination effects can be therefore easily achieved. Instead of evaluating each light contribution on the whole geometric buffer, each shading computation is coherently restricted to a smaller subset a fragments using the sub-buffers. Therefore, each screen pixel in a regular n×m pattern will have its own small set of light contributions. Doing so, the consumed fillrate is considerably decreased and the provided rendering quality remains close to the quality obtained with a non-interleaved approach. The implementation of this rendering pipeline is finally straightforward and it can be easily integrated in any existing real-time rendering package already using deferred shading.Item Efficient Video Decoding on GPUs by Point Based Rendering(The Eurographics Association, 2006) Han, Bo; Zhou, Bingfeng; Marc Olano and Philipp SlusallekTo accelerate computation intensive video decoding tasks, we present a novel framework to offload most decoding operations to current GPUs. Our method is based on rendering graphics points and suitable for block-based video standards. By representing video blocks as graphics points, we achieve great flexibility and high parallelism to utilize the GPU s pipelined stream processing architecture. The computational resources within texture units and blending units are also exploited to facilitate computations. We propose a high performance implementation of IDCT on GPUs, which efficiently excludes most zero-value coefficients to save the bandwidth and the computations. Compared with the existing quad-based representation, our point based implementation of MC greatly reduces data transfer and redundancy. We have demonstrated the efficiency of our proposed framework by a MPEG-2 decoder. Our results indicate a significant improvement over prior CPU and GPU solutions.Item Realistic Soft Shadows by Penumbra-Wedges Blending(The Eurographics Association, 2006) Forest, Vincent; Barthe, Loïc; Paulin, Mathias; Marc Olano and Philipp SlusallekRecent real-time shadow generation techniques try to provide shadows with realistic penumbrae. However, most techniques are whether non-physically based or too simplified to produce convicing results. The penumbra-wedges algorithm is a physical approach based on the assumption that penumbrae are non-overlapping. In this paper, we propose an algorithm that takes the advantages of the penumbra-wedges method but solves the "non-overlapping" limitation. We first compute the light occlusion regions per fragment. Then we use this information to detect the areas where penumbrae are overlapping and we perform a realistic penumbra blending.Item Distributed Texture Memory in a Multi-GPU Environment(The Eurographics Association, 2006) Moerschell, Adam; Owens, John D.; Marc Olano and Philipp SlusallekIn this paper we present a consistent, distributed, shared memory system for GPU texture memory. This model enables the virtualization of texture memory and the transparent, scalable sharing of texture data across multiple GPUs. Textures are stored as pages, and as textures are read or written, our system satisfies requests for pages on demand while maintaining memory consistency. Our system implements a directory-based distributed shared memory abstraction and is hidden from the programmer in order to ease programming in a multi-GPU environment. Our primary contributions are the identification of the core mechanisms that enable the abstraction and the future support that will enable them to be efficient.Item Quadtree Relief Mapping(The Eurographics Association, 2006) Schroders, Marc F. A.; Gulik, Rob van; Marc Olano and Philipp SlusallekRelief mapping is an image based technique for rendering surface details. It simulates depth on a polygonal model using a texture that encodes surface height. The presented method incorporates a quadtree structure to achieve a theoretically proven performance between W(log(p)) and O(pp) for computing the first intersection of a ray with the encoded surface, where p is the number of pixels in the used texture. In practice, the performance was found to be close to log(p) in most cases. Due to the hierarchical nature of our technique, the algorithm scales better than previous comparable techniques and therefore better accommodates to future games and graphics hardware. As the experimental results show, quadtree relief mapping is more efficient than previous techniques when textures larger than 512×512 are used. The method correctly handles self-occlusions, shadows, and irregular surfaces.Item GPU-Accelerated Deep Shadow Maps for Direct Volume Rendering(The Eurographics Association, 2006) Hadwiger, Markus; Kratz, Andrea; Sigg, Christian; Bühler, Katja; Marc Olano and Philipp SlusallekDeep shadow maps unify the computation of volumetric and geometric shadows. For each pixel in the shadow map, a fractional visibility function is sampled, pre-filtered, and compressed as a piecewise linear function. However, the original implementation targets software-based off-line rendering. Similar previous algorithms on GPUs focus on geometric shadows and lose many important benefits of the original concept. We focus on shadows for interactive direct volume rendering, where shadow algorithms currently either compute additional per-voxel shadow data, or employ half-angle slicing to generate shadows during rendering. We adapt the original concept of deep shadow maps to volume ray-casting on GPUs, and show that it can provide anti-aliased high-quality shadows at interactive rates. Ray-casting is used for both generation of the shadow map data structure and actual rendering. High frequencies in the visibility function are captured by a pre-computed lookup table for piecewise linear segments. Direct volume rendering is performed with an additional deep shadow map lookup for each sample. Overall, we achieve interactive high-quality volume ray-casting with accurate shadows. To conclude, we briefly describe how semi-transparent geometry such as hair could be integrated as well, provided that rasterization can write to arbitrary locations in a texture. This would be a major step toward full deep shadow map functionality.Item Efficient Depth Buffer Compression(The Eurographics Association, 2006) Hasselgren, Jon; Akenine-Möller, Tomas; Marc Olano and Philipp SlusallekDepth buffer performance is crucial to modern graphics hardware. This has led to a large number of algorithms for reducing the depth buffer bandwidth. Unfortunately, these have mostly remained documented only in the form of patents. Therefore, we present a survey on the design space of efficient depth buffer implementations. In addition, we describe our novel depth buffer compression algorithm, which gives very high compression ratios.Item Compressed Lossless Texture Representation and Caching(The Eurographics Association, 2006) Inada, Tetsugo; McCool, Michael D.; Marc Olano and Philipp SlusallekA number of texture compression algorithms have been proposed to reduce texture storage size and bandwidth requirements. To deal with the requirement for random access, these algorithms usually divide the texture into tiles and apply a fixed rate compression scheme to each tile. Fixed rate schemes are by nature lossy, and cannot adapt to local changes in image complexity. Multiresolution schemes, a form of variable-rate coding, can adapt to varying image complexity but suffer from fragmentation and can only compress a limited class of images. On the other hand, several lossless image compression standards have been established. Lossless compression requires variable-rate coding, and more efficient lossy algorithms also use variable-rate coding. Unfortunately, these standards cannot be used directly as texture compression schemes since they do not allow random access. We present a block-oriented lossless texture compression algorithm based on a simple variable-bitrate differencing scheme. A B-tree index enables both random access and efficient O(1) memory allocation without external fragmentation. Textures in our test suite compressed to between 6% and 95% of their original sizes. We propose a cache architecture designed to support our compression scheme. Cycle-accurate simulation shows that this cache architecture consistently reduces the external bandwidth requirements as well as the storage size without significantly affecting latency.Item A Digital Rights Enabled Graphics Processing System(The Eurographics Association, 2006) Shi, Weidong; Lee, Hsien-Hsin S.; Yoo, Richard M.; Boldyreva, Alexandra; Marc Olano and Philipp SlusallekWith the emergence of 3D graphics/arts assets commerce on the Internet, to protect their intellectual property and to restrict their usage have become a new design challenge. This paper presents a novel protection model for commercial graphics data by integrating digital rights management into the graphics processing unit and creating a digital rights enabled graphics processing system to defend against piracy of entertainment software and copyrighted graphics arts. In accordance with the presented model, graphics content providers distribute encrypted 3D graphics data along with their certified licenses. During rendering, when encrypted graphics data, e.g. geometry or textures, are fetched by a digital rights enabled graphics processing system, it will be decrypted. The graphics processing system also ensures that graphics data such as geometry, textures or shaders are bound only in accordance with the binding constraints designated in the licenses. Special API extensions for media/software developers are also proposed to enable our protection model. We evaluated the proposed hardware system based on cycle-based GPU simulator with configuration in line with realistic implementation and open source video game Quake 3D.Item The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware(The Eurographics Association, 2006) Sheaffer, Jeremy W.; Luebke, David P.; Skadron, Kevin; Marc Olano and Philipp SlusallekWith shrinking process technology, the primary cause of transient faults in semiconductors shifts away from highenergy cosmic particle strikes and toward more mundane and pervasive causes-power fluctuations, crosstalk, and other random noise. Smaller transistor features require a lower critical charge to hold and change bits, which leads to faster microprocessors, but which also leads to higher transient fault rates. Current trends, expected to continue, show soft error rates increasing exponentially at a rate of 8% per technology generation. Existing transient fault research in general-purpose architecture, like the well-established architectural vulnerability factor (AVF), assume that all computations are equally important and all errors equally intolerable. However, we observe that the effect of transient faults in graphics processing can range from imperceptible, to bothersome visual artifacts, to critical loss of function. We therefore extend and generalize the AVF by introducing the Visual Vulnerability Spectrum (VVS). We apply the VVS to analyze the effect of increased transient error rate on graphics processors. With this analysis in hand, we suggest several targeted, inexpensive solutions that can mitigate the most egregious of soft error consequences.Item B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes(The Eurographics Association, 2006) Woop, Sven; Marmitt, Gerd; Slusallek, Philipp; Marc Olano and Philipp SlusallekThis paper introduces a new spatial index structure, called Bounded KD tree (B-KD tree), for realtime ray tracing of dynamic scenes. By presenting hardware units of all time critical B-KD tree algorithms in the context of a custom realtime ray tracing chip we show that this spatial index structure is well suited for hardware implementation. B-KD trees are a hybrid spatial index structure that combine the advantages of KD trees and Bounding Volume Hierarchies into a single, simple to handle spatial index structure. Similar to KD trees, B-KD trees are binary trees where each node considers only a single spatial dimension. However, instead of a single splitting plane that divides space into two disjoint sub-spaces, each node in B-KD trees contains two pairs of axis aligned planes that bound the geometry of its two child nodes. As a bounding volume approach B-KD trees allow for simple and efficient updates when changing geometry while maintaining the fast traversal operations and simple hardware implementation known from KD trees. This enables the support for dynamic scenes with constant mesh topology and coherent dynamic changes, like typical skinned meshes. Our hardware architecture contains several fixed-function units that completely handle skinning, updating, and ray tracing of dynamic scenes using B-KD trees. An FPGA prototype of this architecture already delivers realtime performance of up to 35 frames per second even when clocked at only 66 MHz.