EGGH05: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2005
Permanent URI for this collection
Browse
Browsing EGGH05: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2005 by Issue Date
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Hardware-Compatible Vertex Compression Using Quantization and Simplification(The Eurographics Association, 2005) Purnomo, Budirijanto; Bilodeau, Jonathan; Cohen, Jonathan D.; Kumar, Subodh; Michael Meissner and Bengt-Olaf SchneiderWe present a vertex compression technique suitable for efficient decompression on graphics hardware. Given a user-specified number of bits per vertex, we automatically allocate bits to vertex attributes for quantization to maximize quality, guided by an image-space error metric. This allocation accounts for the constraints of graphics hardware by packing the quantized attributes into bins associated with the hardware's vectorized vertex data elements. We show that this general approach is also applicable if the user specifies a total desired model size. We present an algorithm that integrally combines vertex decimation and attribute quantization to produce the best quality model for a user-specified data size. Such models have an appropriate balance between the number of vertices and the number of bits per vertex. Vertex data is transmitted to and optionally stored in video memory in the compressed form. The vertices are decompressed on-the-fly using a vertex program at rendering time. Our algorithms not only work well within the constraints of current graphics hardware but also generalize to a setting where these constraints are relaxed. They apply to models with a wide variety of vertex attributes, providing new tools for optimizing space and bandwidth constraints of interactive graphics applications.Item A Fast, Energy-Efficient Z-Comparator(The Eurographics Association, 2005) Hensley, Justin; Singh, Montek; Lastra, Anselmo; Michael Meissner and Bengt-Olaf SchneiderWe present a fast and energy-efficient z-comparator that takes advantage of the fact that the result of most depth comparisons can be determined by examining just a few bits. This feature is made possible by the use of asynchronous logic, which enables the comparator to rapidly compare bits until the result is clear and then stop. Using depth data from well-known computer games, SPICE simulations indicate that our comparator consumes only 25% of the energy and operates 1.67 times faster, on average, compared to an equivalent synchronous design. The comparator design is used to illustrate a more general design principle, compute on demand, which can potentially enable graphics hardware to be faster and more energy-efficient.Item Generic Mesh Refinement on GPU(The Eurographics Association, 2005) Boubekeur, Tamy; Schlick, Christophe; Michael Meissner and Bengt-Olaf SchneiderMany recent publications have shown that a large variety of computation involved in computer graphics can be moved from the CPU to the GPU, by a clever use of vertex or fragment shaders. Nonetheless there is still one kind of algorithms that is hard to translate from CPU to GPU: mesh refinement techniques. The main reason for this, is that vertex shaders available on current graphics hardware do not allow the generation of additional vertices on a mesh stored in graphics hardware. In this paper, we propose a general solution to generate mesh refinement on GPU. The main idea is to define a generic refinement pattern that will be used to virtually create additional inner vertices for a given polygon. These vertices are then translated according to some procedural displacement map defining the underlying geometry (similarly, the normal vectors may be transformed according to some procedural normal map). For illustration purpose, we use a tesselated triangular pattern, but many other refinement patterns may be employed. To show its flexibility, the technique has been applied on a large variety of refinement techniques: procedural displacement mapping, as well as more complex techniques such as curved PN-triangles or ST-meshes.Item Hexagonal Storage Scheme for Interleaved Frame Buffers and Textures(The Eurographics Association, 2005) Bando, Yosuke; Saito, Takahiro; Fujita, Masahiro; Michael Meissner and Bengt-Olaf SchneiderThis paper presents a storage scheme which statically assigns pixel/texel coordinates to multiple memory banks in order to minimize frame buffer and texture memory access load imbalance. In this scheme, the pixels stored in a particular memory bank are placed at the center and the vertices of hexagons packed in the frame buffer. By making these hexagons close to regular so that the pixel placement is uniform and isotropic, frame buffer and texture memory accesses are evenly distributed over the memory banks. The analysis of memory access patterns in rendering typical 3D graphics scenes shows that the hexagonal storage scheme can reduce rendering performance degradation due to bank conflicts by an average of 10% compared to the traditional rectangular storage scheme.Item A Hardware Architecture for Multi-Resolution Volume Rendering(The Eurographics Association, 2005) G.Wetekam,; Staneker, D.; Kanus, U.; M.Wand,; Michael Meissner and Bengt-Olaf SchneiderIn this paper we propose a hardware accelerated ray-casting architecture for multi-resolution volumetric datasets. The architecture is targeted at rendering very large datasets with limited voxel memory resources for both cases where the working set of a frame does or does not fit into the voxel memory. We describe the multi-resolution model used to organize the volume data, especially the wavelet based compression scheme. An efficient hardware implementation of the wavelet decompression is presented and the considerations for volume memory management are discussed. By incorporating the wavelet decompression in hardware a multiple of the decompression bandwidth compared to a PC can be achieved. We also show that the impact of our multi-resolution scheme on the actual ray-casting pipeline is minimal.Item A Reconfigurable Architecture for Load-Balanced Rendering(The Eurographics Association, 2005) Chen, Jiawen; Gordon, Michael I.; Thies, William; Zwicker, Matthias; Pulli, Kari; Durand, Frédo; Michael Meissner and Bengt-Olaf SchneiderCommodity graphics hardware has become increasingly programmable over the last few years but has been limited to fixed resource allocation. These architectures handle some workloads well, others poorly; load-balancing to maximize graphics hardware performance has become a critical issue. In this paper, we explore one solution to this problem using compile-time resource allocation. For our experiments, we implement a graphics pipeline on Raw, a tile-based multicore processor. We express both the full graphics pipeline and the shaders using StreamIt, a high-level language based on the stream programming model. The programmer specifies the number of tiles per pipeline stage, and the StreamIt compiler maps the computation to the Raw architecture. We evaluate our reconfigurable architecture using a mix of common rendering tasks with different workloads and improve throughput by 55-157% over a static allocation. Although our early prototype cannot compete in performance against commercial state-of-the-art graphics processors, we believe that this paper describes an important first step in addressing the load-balancing challenge.Item KD-Tree Acceleration Structures for a GPU Raytracer(The Eurographics Association, 2005) Foley, Tim; Sugerman, Jeremy; Michael Meissner and Bengt-Olaf SchneiderModern graphics hardware architectures excel at compute-intensive tasks such as ray-triangle intersection, making them attractive target platforms for raytracing. To date, most GPU-based raytracers have relied upon uniform grid acceleration structures. In contrast, the kd-tree has gained widespread use in CPU-based raytracers and is regarded as the best general-purpose acceleration structure.We demonstrate two kd-tree traversal algorithms suitable for GPU implementation and integrate them into a streaming raytracer. We show that for scenes with many objects at different scales, our kd-tree algorithms are up to 8 times faster than a uniform grid. In addition, we identify load balancing and input data recirculation as two fundamental sources of inefficiency when raytracing on current graphics hardware.Item Modified Noise for Evaluation on Graphics Hardware(The Eurographics Association, 2005) Olano, Marc; Michael Meissner and Bengt-Olaf SchneiderPerlin noise is one of the primary tools responsible for the success of procedural shading in production rendering. It breaks the crisp computer generated look by adding apparent randomness that is controllable and repeatable. Both Perlin s original noise algorithm and his later improved noise were designed to run efficiently on a CPU. These algorithms do not map as well to the design and resource limits of the typical GPU. We propose two modifications to Perlin s improved noise that make it much more suitable for GPU implementation, allowing faster direct computation. The modified noise can be totally evaluated on the GPU without resorting to texture accesses or "baked" into a texture with consistent appearance between textured and computed noise. However, it is most useful for 3D and 4D noise, which cannot easily be stored in reasonably-sized textures. We present one implementation of our modified noise using computation or direct texturing for 1D and 2D noise, and a procedural combination of 2D textures for the 3D noise.Item GPU-Accelerated High-Quality Hidden Surface Removal(The Eurographics Association, 2005) Wexler, Daniel; Gritz, Larry; Enderton, Eric; Rice, Jonathan; Michael Meissner and Bengt-Olaf SchneiderHigh-quality off-line rendering requires many features not natively supported by current commodity graphics hardware: wide smooth filters, high sampling rates, order-independent transparency, spectral opacity, motion blur, depth of field. We present a GPU-based hidden-surface algorithm that implements all these features. The algorithm is Reyeslike but uses regular sampling and multiple passes. Transparency is implemented by depth peeling, made more efficient by opacity thresholding and a new method called z batches. We discuss performance and some design trade-offs. At high spatial sampling rates, our implementation is substantially faster than a CPU-only renderer for typical scenes.Item Split-Plane Shadow Volumes(The Eurographics Association, 2005) Laine, Samuli; Michael Meissner and Bengt-Olaf SchneiderWe present a novel method for rendering shadow volumes. The core idea of the method is to locally choose between Z-pass and Z-fail algorithms on a per-tile basis. The choice is made by comparing the contents of the low-resolution depth buffer against an automatically constructed split plane.We show that this reduces the number of stencil updates substantially without affecting the resulting shadows. We outline a simple and efficient hardware implementation that enables the early tile culling stages to reject considerably more pixels than with shadow volume optimizations currently available in the hardware.Item Fully Procedural Graphics(The Eurographics Association, 2005) Whitted, T.; Kajiya, J.; Michael Meissner and Bengt-Olaf SchneiderThe growing application of user-defined programs within graphics processing units (GPUs) has transformed the fixed-function display pipeline into a largely programmable pipeline. In this paper we propose that the elements fed through the pipeline be made entirely procedural. To enable this, we present a modification of the conventional graphics processor in which all procedures are executed in a common processor array and the rasterizer is augmented with a more general sampling controller. By executing both the geometric and shading elements of a procedural graphics model in a single processor we retain the data amplification that distinguishes procedural descriptions without a corresponding explosion of external bandwidth.Item Optimal Automatic Multi-pass Shader Partitioning by Dynamic Programming(The Eurographics Association, 2005) Heirich, Alan; Michael Meissner and Bengt-Olaf SchneiderComplex shaders must be partitioned into multiple passes to execute on GPUs with limited hardware resources. Automatic partitioning gives rise to an NP-hard scheduling problem that can be solved by any number of established techniques. One such technique, Dynamic Programming (DP), is commonly used for instruction scheduling and register allocation in the code generation phase of compilers. Since automatic partitioning occurs during the shader compilation process it is natural to ask whether DP is useful for shader partitioning as well as for code generation. This paper demonstrates that these problems are Markovian and can be solved by DP techniques. It presents a DP algorithm for shader partitioning that can be adapted for use with any GPU architecture. Unlike solutions produced by other techniques DP solutions are globally optimal. Experimental results on a set of test cases with a commercial prerelease compiler for a popular high level shading language showed a DP algorithm had an average runtime cost of O(n1:14966) which is less than O(nlogn) on the region of interest in n. This demonstrates that efficient and optimal automatic shader partitioning can be an emergent byproduct of a DP-based code generator for a very high performance GPU.Item iPACKMAN: High-Quality, Low-Complexity Texture Compression for Mobile Phones(The Eurographics Association, 2005) Ström, Jacob; Akenine-Möller, Tomas; Michael Meissner and Bengt-Olaf SchneiderWe present a novel texture compression scheme, called iPACKMAN, targeted for hardware implementation. In terms of image quality, it outperforms the previous de facto standard texture compression algorithms in the majority of all cases that we have tested. Our new algorithm is an extension of the PACKMAN texture compression system, and while it is a bit more complex than PACKMAN, it is still very low in terms of hardware complexity.