EGGH: SIGGRAPH/Eurographics Workshop on Graphics Hardware

Permanent URI for this community

https://diglib.eg.org/handle/10.2312/322

Browse

Now showing 1 - 20 of 25

Adaptive Hierarchical Visibility in a Tiled Architecture
(The Eurographics Association, 1999) Xie, Feng; Shantz, Michael; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
This paper describes a method for occlusion culling in a tiled 3D graphics hardware architecture. Adaptive hierarchical visibility (AHV) is a simplified method for occlusion culling that is integrated into a tiled architecture for hardware rendering. AI-IV constructs a list of polygon bins for each tile where the bins are bucket sorted in order of increasing depth or Z. Polygon bins are rendered starting with the bin closest to the viewer. After some number of bins are rendered, a one layer, hierarchical Zbuffer (HZ) is constructed from the Z-buffer thus far accumulated for the rendered bins. Subsequent bins are rendered by first testing their polygons against the HZ to see if they are hidden. AHV is far simpler to implement in hardware and gives performance that matches or surpasses progressive hierarchical visibility (PHV) methods which update the HZ for each rendered pixel. Results show that AI-IV is superior on scenes with high depth complexity and small polygons. For tiles of widely ranging statistics, AHV competes surprisingly well with PHV. It offers dramatic performance improvement on low cost hardware for scenes of high depth complexity.
Antialiased Parameterized Solid Texturing Simplified for Consumer- Level Hardware Implementation
(The Eurographics Association, 1999) Hart, John C.; Carr, Nate; Karneya, Masaki; Tibbitts, Stephen A.; Coleman, Terrance J.; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for teal-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.
Codesign Of Graphics Hardware Accelerators
(The Eurographics Association, 1997) Ewins, Jon P.; L.Watten, Phil; White, Martin; McNeill, Michael D. J.; Lister, Paul F.; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
The design of a hardware architecture for a computer graphics pipeline requires a thorough understanding of the algorithms involved at each stage, and the implications these algorithms have on the organisation of the pipeline architecture. The choice of algorithm, the flow of pixel data through the pipeline, and bit width precision issues are crucial decisions in the design of new hardware accelerators. Making these decisions correctly requires intensive investigation and experimentation. The use of hardware description languages such as VHDL, allow for sound top down design methodologies, but their effectiveness in such experimental work is limited. This paper discusses the use of software tools as an aid to hardware development and presents applications that demonstrate the possibilities of this approach and the benefits that can be attained from an integrated codesign design environment.
Design of a Fast Voxel Processor for Parallel Volume Visualization
(The Eurographics Association, 1995) Lichtennann, Jan; W. Strasser
The basics of a parallel real-time volume visualization architecture are introduced. Volume data is divided into subcubes that are dis tributed among multiple image processors and stored in their pri vate voxel memories. Rays fall into ray segments at the subcube borders. Each image processor is responsible for the ray segments within its assigned subcubes. Results of the ray segments are passed to the image processor where the ray continues. The enu meration of resampling points on the ray segments and the interpo lation at resampling points is accelerated by the voxel processor. The voxel processor can additionally compute a normalized gradi ent vector at a resampling point used as a surface normal estima tion for shading calculations. In the paper the focus is on operation and hardware implementation of this pipeline processor and the organization of voxel memory. The instruction set of the voxel pro cessor is explained. A performance of 20 images per second for a 2563 voxel volume and 16 image processors can be achieved.
Hybrid Volume and Polygon Rendering with Cube Hardware
(The Eurographics Association, 1999) Kreeger, Kevin; Kaufman, Arie; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
We present two methods which connect today s polygon graphics hardware accelerators to Cube-5 volume rendering hardware, the successor to Cube4 The proposed methods allow mixing of both opaque and translucent polygons with volumes on PC class machines, while ensuring the correct compositing order of all objects. Both implementations connect the two hardware acceleration subsystems at the frame buffer. One shares a common DRAM buffer and one run-length encodes images of thin slabs of polygonal data and then combines them in the Cube composite buffer In both realizations, we take advantage of the predictable ordered access to frame buffer storage that is utilized by Cube-5 and the rest of the family of volume rendering accelerators based on the Cube design.
IMEM: An Intelligent Memory for Bump- and Reflection-Mapping
(The Eurographics Association, 1998) Kugler, Anders; S. N. Spencer
Data path simplification in the context of reflection- and bumpmapping hardware opens new solutions in the design of rendering and shading circuits. We are proposing a novel approach to rendering bump- and reflection-mapped surfaces, where the local geometry defining bump-maps is transformed on-the-fly prior to surface shading. Applying angular encoding to normal vectors results in narrower data paths and permits hardware integration of look-up tables of acceptable size. A special-purpose logic-embedded memory architecture is presented, where bump- and reflection-mapping of textured surfaces are executed by an intelligent memory device. High-performance surface shading is achieved by making use of precomputed shading- and reflection-map coordinate generation tables, and considering cache coherence of pixel-to-pixel normal vectors. Such a dedicated memory chip can easily be interfaced to a standard rasterizer, in place of texture memory to offer bump-, texture- and reflection-mapping hardware support.
A Low-Cost Memory Architecture For PCI-Based Interactive Ray Casting
(The Eurographics Association, 1999) Doggett, Michael; Meißner, Michael; Kanust, Urs; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
In this paper we present a low-cost memory architecture running at 100 MHz which is suited for any PCI-based volume rendering accelerator using the ray-casting approach. Current SDRAM technology, parallel access to all voxels required for trilinear interpolation, a cubic addressing scheme, and a buffering mechanism accommodating memory latency are applied to achieve high frame-rates. A total of four off-the-shelf standard DIMM modules are required enabling up to 9 Hz (averaged over a representative set of views) for datasets of 2563 voxels, using early ray termination as the only algorithmic optimization. The presented memory architecture is a good balance of cost versus feasibility on a standard PC1 card - accepting data replication - and will be used for the VIZARD II ray casting accelerator.
Memory Access Patterns of Occlusion-Compatible 3D Image Warping
(The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
McMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.
Multiresolution Rendering With Displacement Mapping
(The Eurographics Association, 1999) Gumhold, Stefan; Hüttner, Tobias; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
In this paper, we present for the first time an approach for hardware accelerated displacement mapping. The displaced surface is generated from a 2D displacement map by remeshing a coarse triangle mesh according to the screen projection of the surface The remeshing algorithm is implemented in hardware. Filtered access to the displacement map makes our approach competitive with available view dependent multiresolution techniques. The advantage of displacement mapping is the compact representation. A displacement mapped surface consumes together with all filter levels only a fraction of the storage space needed for a hardware compatible representation of an equivalent triangle mesh. A possible design of the displacement mapping rendering pipeline is proposed. Previously described hardware components are used as often as possible. Our approach can be smoothly integrated into all available graphics application programming interfaces. Most existing graphics applications can be extended to the new feature with marginal effort.
Neon: A Single-Chip 3D Workstation Graphics Accelerator
(The Eurographics Association, 1998) McCormack, Joel; McNamara, Robert; Gianos, Christopher; Seiler, Larry; Jouppi, Norman P.; Correll, Ken; S. N. Spencer
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, 1. depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital s Alpha CPUs. Neon s performance is between HP s Visualize fx<sup>4</sup> and fx<sup>6</sup>, and is well above SGI s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.
Optimal Depth Buffer for Low-Cost Graphics Hardware
(The Eurographics Association, 1999) Lapidous, Eugene; Jiao, Guofang; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
3D applications using hardware depth buffers for visibility testing are confronted with multiple choices of buffer types, sizes and formats. Some of the options are not exposed through 3D API or may be used by the driver without application s knowledge. As a result, it becomes increasingly difficult to select depth buffer optimal for desired balance between performance and precision. In this paper we provide comparative evaluation of depth precision for main depth buffer types with different size and format combinations. Results indicate that integer storage is preferred for some buffer types, while others achieve maximal depth resolution with floating-point format optimized for known scene parameters. We propose to give 3D applications full control of the depth buffer optimization by supporting multiple storage formats with the same buffer size and exposing them in 3D API. In the search for a unified depth buffer solution, we describe new type of the depth buffer and compare it with other options. Complementary floating-point Z buffer is a combination of a reversed-direction Z buffer and an optimal floating-point storage format. Non-linear mapping and storage format compensate each other s effect on the depth precision; as a result, depth errors become significantly less dependent on the eye-space distance, improving depth resolution by the orders of magnitude in comparison with standard Z buffer. Results show that complementary Z buffer is also superior to inverse W buffer at any storage size. At 16 and 24 bits/pixel, average depth errors of complementary Z buffer remain 2 times larger than for true W buffer utilizing expensive high-precision per-pixel division. However, it provides absolutely best precision at 32 bits/pixel, when errors are limited by floating-point per-vertex input. Results suggest that complementary floating-point Z buffer can be considered as a candidate for replacement of both screen Z and inverse W buffers, at the same time making hardware investment in the true W buffer support less attractive.
Parallel Texture Caching
(The Eurographics Association, 1999) lgehy, Homan; Eldridge, Matthew; Hanrahan, Pat; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.
PAVLOV: A Programmable Architecture for Volume Processing
(The Eurographics Association, 1998) Kreeger, Kevin; Kaufman, Arie; S. N. Spencer
We present a parallel 2D mesh connected architecture with SIMD processing elements. The design allows for real-time volume rendering as well as interactive 30 segmentation and 1D feature extraction. This is possible because the SIMD processing elements are programmable, a feature which also allows the use of many different rendering algorithms. We present an algorithm which, with the addition of hardware resources, provides conflict free access to volume slices along any of the three major axes. The volume access conflict has been the main reason why previous similar architectures could not perform real-time volume rendering. We present the performance of preliminary algorithms on a software simulator of the architecture design.
Performance Issues of a Distributed Frame Buffer on a Multicomputer
(The Eurographics Association, 1998) Wei, Bin; Clark, Douglas W.; Felten, Edward W.; Li, Kai; S. N. Spencer
A multiple-port, distributed frame buffer has been recently proposed to support parallel rendering on multicomputers. This paper describes an implementation of such a distributed frame buffer for the Intel Paragon routing network, and reports its performance results. We have conducted several experiments with the system we have developed. Our results indicate that placing a multipleport, distributed frame buffer directly on the host internal routing network can provide high throughput to eliminate the bottleneck of merging a final image from multiple processors to a frame buffer. This architectural approach can also effectively support image composition for sort-last. The synchronization algorithm we have developed requires only one-way communication and minimizes receive overhead for message passing to the frame buffer.
PixelFlow: The Realization
(The Eurographics Association, 1997) Eyles, John; Molnar, Steven; Poulton, John; Greer, Trey; Lastra, Anselmo; England, Nick; Westover, Lee; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
PixelFlow is an architecture for high-speed, highly realistic image generation, based on the techniques of object-parallelism and image composition. Its initial architecture was described in [MOLN92]. After development by the original team of researchers at the University of North Carolina, and codevelopment with industry partners, Division Ltd. and Hewlett- Packard, PixelFlow now is a much more capable system than initially conceived and its hardware and software systems have evolved considerably. This paper describes the final realization of PixelFlow, along with hardware and software enhancements heretofore unpublished.
Prefetching in a Texture Cache Architecture
(The Eurographics Association, 1998) lgehy, Homan; Eldridge, Matthew; Proudfoot, Kekoa; S. N. Spencer
Texture mapping has become so ubiquitous in real-time graphics hardware that many systems are able to perform filtered texturing without any penalty in fill rate. The computation rates available in hardware have been outpacing the memory access rates, and texture systems are becoming constrained by memory bandwidth and latency. Caching in conjunction with prefetching can be used to alleviate this problem. In this paper, WC introduce a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping. The structures needed are relatively simple and arc amenable to high clock rates. To quantify the robustness of our architecture, we identify a set of six scenes whose texture locality varies over nearly two orders of magnitude and a set 01 four memory systems with varying bandwidths and latencies. Through the use of a cycle-accurate simulation, we demonstrate that even in the presence of a high-latency memory system, our architecture can attain at least 97% of the performance of a zerolatency memory system.
Realizing OpenGL: Two Implementations of One Architecture
(The Eurographics Association, 1997) Kilgard, Mark J.; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
The OpenGL Graphics System provides a well-specified, widely accepted dataflow for 3D graphics and imaging. OpenGL is an architecture; an OpenGL-capable computer is a hardware manifestation or implementaion of that architecture. The Onyx2 InfiniteReality and 02 workstations exemplify two very different implementations of OpenGL. The two designs respond to different cost, performance, and capability goals. Common practice is to describe a graphics hardware implementation based on how the hardware itself operates. However, this paper discusses two OpenGL hardware implementations based on how they embody the OpenGL architecture. An important thread throughout is how OpenGL implementations can be designed not merely based on graphics price-performance considerations, but also with consideration of larger system issues such as memory architecture, compression, and video processing. Just as OpenGL is influenced by wider system concerns, OpenGL itself can provide a clarifying influence on system capabilities not conventionally thought of as graphics-related.
Simple Models of the Impact of Overlap in Bucket Rendering
(The Eurographics Association, 1998) Chen, Milton; Stall, Gordon; Igehy, Homan; Proudfoot, Kekoa; Hanrahan, Pat; S. N. Spencer
Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benelits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions, Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced-that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently.
Texture Shaders
(The Eurographics Association, 1999) McCool, Michael D.; Heidrich, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.- O. Schneider
Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects.
Towards Real-Time Photorealistic Rendering: Challenges and Solutions
(The Eurographics Association, 1997) Schilling, Andreas; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider
A growing number of real-time applications need graphics with photorealistic quality, especially in the field of training (virtual operation, driving and flightsimulation), but also in the areas of design or ergonomic research. We take a closer look at main deficiencies of today s real time graphics hardware and present solutions for several of the identified problems in the areas of antialiasing and texture-. bump- and reflection mapping. In the second part of the paper, a new method for antialiasing bump maps is explained in more detail.

Browse

Browsing EGGH: SIGGRAPH/Eurographics Workshop on Graphics Hardware by Subject "1.3.1 [Computer Graphics]"

Results Per Page

Sort Options