EGPGV08: Eurographics Symposium on Parallel Graphics and Visualization
Permanent URI for this collection
Browse
Browsing EGPGV08: Eurographics Symposium on Parallel Graphics and Visualization by Issue Date
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Multi-GPU Sort-Last Volume Visualization(The Eurographics Association, 2008) Marchesin, Stéphane; Mongenet, Catherine; Dischler, Jean-Michel; Jean M. Favre and Kwan-Liu MaIn this paper, we propose an experimental study of an inexpensive off-the-shelf sort-last volume visualization architecture based upon multiple GPUs and a single CPU. We show how to efficiently make use of this architecture to achieve high performance sort-last volume visualization of large datasets. We analyze the bottlenecks of this architecture.We compare this architecture to a classical sort-last visualization system using a cluster of commodity machines interconnected by a gigabit Ethernet network. Based on extensive experiments, we show that this solution competes very well with a mid-sized PC cluster, while it significantly improves performance compared to a single standard PC.Item CUDASA: Compute Unified Device and Systems Architecture(The Eurographics Association, 2008) Strengert, Magnus; Müller, Christoph; Dachsbacher, Carsten; Ertl, Thomas; Jean M. Favre and Kwan-Liu MaWe present an extension to the CUDA programming language which extends parallelism to multi-GPU systems and GPU-cluster environments. Following the existing model, which exposes the internal parallelism of GPUs, our extended programming language provides a consistent development interface for additional, higher levels of parallel abstraction from the bus and network interconnects. The newly introduced layers provide the key features specific to the architecture and programmability of current graphics hardware while the underlying communica- tion and scheduling mechanisms are completely hidden from the user. All extensions to the original programming language are handled by a self-contained compiler which is easily embedded into the CUDA compile process. We evaluate our system using two different sample applications and discuss scaling behavior and performance on different system architectures.Item Parallel Volume Rendering on the IBM Blue Gene/P(The Eurographics Association, 2008) Peterka, Tom; Yu, Hongfeng; Ross, Robert; Ma, Kwan-Liu; Jean M. Favre and Kwan-Liu MaParallel volume rendering is implemented and tested on an IBM Blue Gene distributed-memory parallel architecture. The goal of studying the cost of parallel rendering on a new class of supercomputers such as the Blue Gene/P is not necessarily to achieve real-time rendering rates. It is to identify and understand the extent of bottlenecks and interactions between various components that affect the design of future visualization solutions on these machines, solutions that may offer alternatives to hardware-accelerated volume rendering, for example, when large volumes, large image sizes, and very high quality results are dictated by peta- and exascale data. As a step in that direction, this study presents data from experiments under a number of conditions, including dataset size, number of processors, low- and high-quality rendering, offline storage of results, and streaming of images for remote display. Performance is divided into three main sections of the algorithm: disk I/O, rendering, and compositing. The dynamic balance among these tasks varies with the number of processors and other conditions. Lessons learned from the work include understanding the balance between parallel I/O, computation, and communication within the context of visualization on supercomputers; recommendations for tuning and optimization; and opportunities for further scaling. Extrapolating these results to very large data and image sizes suggests that a distributed-memory high-performance computing architecture such as the Blue Gene is a viable platform for some types of visualization at very large scales.Item A Scalable Parallel Force-Directed Graph Layout Algorithm(The Eurographics Association, 2008) Tikhonova, Anna; Ma, Kwan-Liu; Jean M. Favre and Kwan-Liu MaUnderstanding the structure, dynamics, and evolution of large graphs is becoming increasingly important in a variety of fields. The demand for visual tools to aid in this process is rising accordingly. Yet, many algorithms that create good representations of small and medium-sized graphs do not scale to larger graph sizes. The exploitation of the massive computational power provided by parallel and distributed computing is a natural progression for handling important problems such as large graph layout. In this paper, we present a scalable parallel graph layout algorithm based on the force-directed model. Our algorithm requires minimal pre-processing and achieves scalability by conducting the layout computation in stages - a portion of the graph at a time, and decreasing the amount of inter-processor communication as the layout computation progresses. We provide the implementation details of our algorithm, evaluate its performance and scalability, and compare the visual quality of the resulting drawings against some of the classic and the fastest algorithms for the layout of general graphs.Item Time-Critical Distributed Visualization with Fault Tolerance(The Eurographics Association, 2008) Gao, Jinzhu; Liu, Huadong; Huang, Jian; Beck, Micah; Wu, Qishi; Moore, Terry; Kohl, James; Jean M. Favre and Kwan-Liu MaIt is often desirable or necessary to perform scientific visualization in geographically remote locations, away from the centralized data storage systems that hold massive amounts of scientific results. The larger such scientific datasets are, the less practical it is to move these datasets to remote locations for collaborators. In such scenarios, efficient remote visualization solutions can be crucial. Yet the use of distributed or heterogeneous computing resources raises several challenges for large-scale data visualization. Algorithms must be robust and incorporate advanced load balancing and scheduling techniques. In this paper, we propose a time-critical remote visualization system that can be deployed over distributed and heterogeneous computing resources. We introduce an "importance" metric to measure the need for processing each data partition based on its degree of contribution to the final visual image. Factors contributing to this metric include specific application requirements, value distributions inside the data partition, and viewing parameters. We incorporate "visibility" in our measurement as well so that empty or invisible blocks will not be processed. Guided by the data blocks' importance values, our dynamic scheduling scheme determines the rendering priority for each visible block. That is, more important blocks will be rendered first. In time-critical scenarios, our scheduling algorithm also dynamically reduces the level-of-detail for the less important regions so that visualization can be finished in a user-specified time limit with highest possible image quality. This system enables interactive sharing of visualization results. To evaluate the performance of this system, we present a case study using a 250 Gigabyte dataset on 170 distributed processors.Item Parallel Longest Common Subsequence using Graphics Hardware(The Eurographics Association, 2008) Kloetzli, John; Strege, Brian; Decker, Jonathan; Olano, Marc; Jean M. Favre and Kwan-Liu MaWe present an algorithm for solving the Longest Common Subsequence problem using graphics hardware accel- eration. We identify a parallel memory access pattern which enables us to run efficiently on multiple layers of parallel hardware by matching each layer to the best sub-algorithm, which is determined using a mix of theoretical and experimental data including knowledge of the specific hardware and memory structure of each layer. We implement a linear-space, cache-coherent algorithm on the CPU, using a two-level algorithm on the GPU to com- pute subproblems quickly. The combination of all three running on a CPU/GPU pair is a fast, flexible and scalable solution to the Longest Common Subsequence problem. Our design method is applicable to other algorithms in the Gaussian Elimination Paradigm, and can be generalized to more levels of parallel computation such as GPU clusters.Item Acceleration of Opacity Correction Mechanisms for Over-sampled Volume Ray Casting(The Eurographics Association, 2008) Lee, Jong Kwan; Newman, Timothy S.; Jean M. Favre and Kwan-Liu MaTechniques for accelerated opacity correction for over-sampled volume ray casting on commodity hardware are described. The techniques exploit processing capabilities of programmable GPUs and cluster computers. The GPU-based technique follows a fine-grained parallel approach that exposes to the GPU the inherent parallelism in the opacity correction process. The cluster computation techniques follow less finely-granular data parallel approaches that allow exploitation of computational resources with minimal inter-CPU communication. The performance improvements offered by the accelerated approaches over opacity correction on a single CPU are also exhibited for real volumetric datasets.Item Parallel Simplification of Large Meshes on PC Clusters(The Eurographics Association, 2008) Xiong, Hua; Jiang, Xiaohong; Zhang, Yaping; Shi, Jiaoying; Jean M. Favre and Kwan-Liu MaLarge meshes are becoming commonplace with the advance of 3D scanning, scientific simulation and CAD technology. While there are many algorithms proposed to simplify these large meshes, the time of simplification process is usually very long, especially for those algorithms based on iterative edge collapse. To address this problem, we propose two parallel schemes to speed up simplifying large meshes on a PC cluster. The first parallel simplification scheme partitions a large mesh into small sub-meshes, simplifies these sub-meshes in parallel in an in-core way and finally stitches the simplified versions together. The second scheme generates multiple mesh streams, applies stream simplification to them in parallel in an out-of-core way, and composes the final simplified mesh streams. We have implemented these two parallel simplification schemes and the experimental results show that our methods are able to speed up the iterative simplification of large meshes by a factor of 8 to 19 on a cluster of 24 PCs.Item High-Fidelity Rendering of Animations on the Grid: A Case Study(The Eurographics Association, 2008) Aggarwal, Vibhor; Chalmers, Alan; Debattista, Kurt; Jean M. Favre and Kwan-Liu MaGeneration of physically-based rendered animations is a computationally expensive process, often taking many hours to complete. Parallel rendering, on shared memory machines and small to medium clusters, is often em- ployed to improve overall rendering times. Massive parallelism is possible using Grid computing. However, since the Grid is a multi-user environment with a large number of nodes potentially separated by substantial network distances; communication should be kept minimum. While for some rendering algorithms running animations on the Grid may be a simple task of assigning an individual frame for each processor, certain acceleration data structures, such as irradiance caching require different approaches. The irradiance cache, which caches the in- direct diffuse samples for interpolation of indirect lighting calculations, may be used to significantly reduce the computational requirements when generating high-fidelity animations. Parallel solutions for irradiance caching using shared memory or message passing are not ideal for Grid computing due to the communication overhead and must be adapted for this highly parallel environment. This paper presents a case study on rendering of high- fidelity animations using a two-pass approach by adapting the irradiance cache algorithm for parallel rendering using Grid computing. This approach exploits the temporal coherence between animation frames to significantly gain speed-up and enhance visual quality. The key feature of our approach is that it does not use any additional data structure and can thus be used with any irradiance cache or similar acceleration mechanism for rendering on the Grid.Item Streaming Model Based Volume Ray Casting Implementation for Cell Broadband Engine(The Eurographics Association, 2008) Kim, Jusub; JaJa, Joseph; Jean M. Favre and Kwan-Liu MaIn this paper, we propose an experimental study of an inexpensive off-the-shelf sort-last volume visualization architecture based upon multiple GPUs and a single CPU. We show how to efficiently make use of this architecture to achieve high performance sort-last volume visualization of large datasets. We analyze the bottlenecks of this architecture.We compare this architecture to a classical sort-last visualization system using a cluster of commodity machines interconnected by a gigabit Ethernet network. Based on extensive experiments, we show that this solution competes very well with a mid-sized PC cluster, while it significantly improves performance compared to a single standard PC.