EGGH07: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2007
Permanent URI for this collection
Browse
Browsing EGGH07: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2007 by Title
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Accelerating Real-Time Shading with Reverse Reprojection Caching(The Eurographics Association, 2007) Nehab, Diego; Sander, Pedro V.; Lawrence, Jason; Tatarchuk, Natalya; Isidoro, John R.; Mark Segal and Timo AilaEvaluating pixel shaders consumes a growing share of the computational budget for real-time applications. However, the significant temporal coherence in visible surface regions, lighting conditions, and camera location allows reusing computationally-intensive shading calculations between frames to achieve significant performance improvements at little degradation in visual quality. This paper investigates a caching scheme based on reverse reprojection which allows pixel shaders to store and reuse calculations performed at visible surface points. We provide guidelines to help programmers select appropriate values to cache and present several policies for keeping cached entries up-to-date. Our results confirm this approach offers substantial performance gains for many common real-time effects, including precomputed global lighting effects, stereoscopic rendering, motion blur, depth of field, and shadow mapping.Item ETC2: Texture Compression using Invalid Combinations(The Eurographics Association, 2007) Stroem, Jacob; Pettersson, Martin; Mark Segal and Timo AilaWe present a novel texture compression system for improved image quality. Building on the iPACKMAN/ETC method, bit combinations that are invalid in that system are used to allow for three additional decompression modes without increasing the bit rate. These modes increase quality, especially for color edges and blocks with smoothly varying content. Due to the use of invalid bit combinations, the system, called ETC2, is backwards compatible with iPACKMAN/ETC. It outperforms S3TC/DXTC and iPACKMAN/ETC in terms of PSNR with 0.8 dB and 1.0 dB respectively, which is clearly visible to the human eye.Item Exact and Error-bounded Approximate Color Buffer Compression and Decompression(The Eurographics Association, 2007) Rasmusson, Jim; Hasselgren, Jon; Akenine-Moeller, Tomas; Mark Segal and Timo AilaIn this paper, we first present a survey of existing color buffer compression algorithms. After that, we introduce a new scheme based on an exactly reversible color transform, simple prediction, and Golomb-Rice encoding. In addition to this, we introduce an error control mechanism, which can be used for approximate (lossy) color buffer compression. In this way, the introduced error is kept under strict control. To the best of our knowledge, this has not been explored before in the literature. Our results indicate superior compression ratios compared to existing algorithms, and we believe that approximate compression can be important for mobile GPUs.Item A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors(The Eurographics Association, 2007) Sheaffer, Jeremy W.; Luebke, David P.; Skadron, Kevin; Mark Segal and Timo AilaGeneral purpose computation on graphics processors (GPGPU) has rapidly evolved since the introduction of commodity programmable graphics hardware. With the appearance of GPGPU computation-oriented APIs such as AMD s Close to the Metal (CTM) and NVIDIA s Compute Unified Device Architecture (CUDA), we begin to see GPU vendors putting financial stakes into this non-graphics, one-time niche market. Major supercomputing installations are building GPGPU clusters to take advantage of massively parallel floating point capabilities, and Folding@Home has even released a GPU port of its protein folding distributed computation client. But in order for GPGPU to truly become important to the supercomputing community, vendors will have to address the heretofore unimportant reliability concerns of graphics processors. We present a hardware redundancy-based approach to reliability for general purpose computation on GPUs that requires minimal change to existing GPU architectures. Upon detecting an error, the system invokes an automatic recovery mechanism that only recomputes erroneous results. Our results show that our technique imposes less than a 1.5× performance penalty and saves energy for GPGPU but is completely transparent to general graphics and does not affect the performance of the games that drive the market.Item A Hardware-Aware Debugger for the OpenGL Shading Language(The Eurographics Association, 2007) Strengert, Magnus; Klein, Thomas; Ertl, Thomas; Mark Segal and Timo AilaThe enormous flexibility of the modern GPU rendering pipeline as well as the availability of high-level shader languages have led to an increased demand for sophisticated programming tools. As the application domain for GPU-based algorithms extends beyond traditional computer graphics, shader programs become more and more complex. The turn-around time for debugging, profiling, and optimizing GPU-based algorithms is now a critical factor in application development which is not addressed adequately by the tools available. In this paper we present a generic, minimal intrusive, and application-transparent solution for debugging OpenGL Shading Language programs, which for the first time fully supports GLSL 1.2 vertex and fragment shaders plus the recent geometry shader extension. By transparently instrumenting the shader program we retrieve information directly from the hardware pipeline and provide data for visual debugging and program analysis.Item A Low-Power Handheld GPU using Logarithmic Arithmetic and Triple DVFS Power Domains(The Eurographics Association, 2007) Nam, Byeong-Gyu; Lee, Jeabin; Kim, Kwanho; Lee, Seung Jin; Yoo, Hoi-Jun; Mark Segal and Timo AilaIn this paper, a low-power GPU architecture is described for the handheld systems with limited power and area budgets. The GPU is designed using logarithmic arithmetic for power- and area-efficient design. For this GPU, a multifunction unit is proposed based on the hybrid number system of floating-point and logarithmic numbers and the matrix, vector, and elementary functions are unified into a single arithmetic unit. It achieves the single-cycle throughput for all these functions, except for the matrix-vector multiplication with 2-cycle throughput. The vertex shader using this function unit as its main datapath shows 49.3% cycle count reduction compared with the latest work for OpenGL transformation and lighting (TnL) kernel. The rendering engine uses also the logarithmic arithmetic for implementing the divisions in pipeline stages. The GPU is divided into triple dynamic voltage and frequency scaling power domains to minimize the power consumption at a given performance level. It shows a performance of 5.26Mvertices/s at 200MHz for the OpenGL TnL and 52.4mW power consumption at 60fps. It achieves 2.47 times performance improvement while reducing 50.5% power and 38.4% area consumption compared with the latest work.Item Practical logarithmic rasterization for low-error shadow maps(The Eurographics Association, 2007) Lloyd, D. Brandon; Govindaraju, Naga K.; Molnar, Steven E.; Manocha, Dinesh; Mark Segal and Timo AilaLogarithmic shadow maps can deliver the same quality as competing shadow map algorithms with substantially less storage and bandwidth. We show how current GPU architectures can be modified incrementally to support rendering of logarithmic shadow maps at current GPU fill rates. Specifically, we modify the rasterizer to support rendering to a nonuniform grid with the same watertight rasterization properties as current rasterizers. We also describe a depth compression scheme to handle the nonlinear primitives produced by logarithmic rasterization. Our proposed architecture enhancements align with current trends of decreasing cost for on-chip computation relative to off-chip bandwidth and storage. For a modest increase in computation, logarithmic rasterization can greatly reduce shadow map bandwidth and storage costs.Item Programmable Shaders for Deformation Rendering(The Eurographics Association, 2007) Correa, Carlos D.; Silver, Deborah; Mark Segal and Timo AilaIn this paper, we present a method for rendering deformations as part of the programmable shader pipeline of contemporary Graphical Processing Units. In our method, we allow general deformations including cuts. Previous approaches to deformation place the role of the GPU as a general purpose processor for computing vertex displacement.With the advent of vertex texture fetch in current GPUs, a number of approaches have been proposed to integrate deformation into the rendering pipeline. However, the rendering of cuts cannot be easily programmed into a vertex shader, due to the inability to change the topology of the mesh. Furthermore, rendering smooth deformed surfaces requires a fine tessellation of the mesh, in order to prevent self-intersection and meshing artifacts for large deformations. In our approach, we overcome these problems by considering deformation as a part of the pixel shader, where transformation is performed on a per-pixel basis. We demonstrate how this approach can be efficiently implemented using contemporary graphics hardware to obtain high-quality rendering of deformation at interactive rates.Item A Real-Time FPGA-Based Architecture for a Reinhard-like Tone Mapping Operator(The Eurographics Association, 2007) Hassan, F.; Carletta, J. E.; Mark Segal and Timo AilaThis paper presents a field-programmable gate array-based hardware architecture for a Reinhard-like tone mapping operator. Modifications to the original Reinhard operator were done to ensure that the operator is amenable to implementation in hardware. The architecture is described in VHDL and has been synthesized using Altera Quartus tools. It achieves an operating frequency consistent with a video rate of 60 frames per second for a frame of 1024×768 pixels. The quality of the implementation is measured using peak signal-tonoise ratios on testbench images.Item Scan Primitives for GPU Computing(The Eurographics Association, 2007) Sengupta, Shubhabrata; Harris, Mark; Zhang, Yao; Owens, John D.; Mark Segal and Timo AilaThe scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.Item Stochastic Rasterization using Time-Continuous Triangles(The Eurographics Association, 2007) Akenine-Möller, Tomas; Munkberg, Jacob; Hasselgren, Jon; Mark Segal and Timo AilaWe present a novel algorithm for stochastic rasterization which can rasterize triangles with attributes depending on a parameter, t, varying continuously from tItem Tight Frame Normal Map Compression(The Eurographics Association, 2007) Munkberg, Jacob; Olsson, Ola; Stroem, Jacob; Akenine-Moeller, Tomas; Mark Segal and Timo AilaWe present a new powerful and flexible fixed-rate normal map compression algorithm with higher quality than existing schemes on a test suite of normal maps. Our algorithm encodes a tight box with uniform normals inside the box, and in addition, a special mode is introduced for handling slowly varying normals. We also discuss several error measures needed to understand the qualities of different algorithms. We believe the high quality of our technique makes it a potential candidate for inclusion in OpenGL ES.