La VALSE: Scalable Log Visualization for Fault Characterization in Supercomputers

dc.contributor.authorGuo, Hanqien_US
dc.contributor.authorDi, Shengen_US
dc.contributor.authorGupta, Rinkuen_US
dc.contributor.authorPeterka, Tomen_US
dc.contributor.authorCappello, Francken_US
dc.contributor.editorHank Childs and Fernando Cucchiettien_US
dc.date.accessioned2018-06-02T18:02:44Z
dc.date.available2018-06-02T18:02:44Z
dc.date.issued2018
dc.description.abstractWe design and implement La VALSE-a scalable visualization tool to explore tens of millions of records of reliability, availability, and serviceability (RAS) logs-for IBM Blue Gene/Q systems. Our tool is designed to meet various analysis requirements, including tracing causes of failure events and investigating correlations from the redundant and noisy RAS messages. La VALSE consists of multiple linked views to visualize RAS logs; each log message has a time stamp, physical location, network address, and multiple categorical dimensions such as severity and category. The timeline view features the scalable ThemeRiver and arc diagrams that enables interactive exploration of tens of millions of log messages. The spatial view visualizes the occurrences of RAS messages on hundreds of thousands of elements of Mira-compute cards, node boards, midplanes, and racks-with viewdependent level-of-detail rendering. The multidimensional view enables interactive filtering of different categorical dimensions of RAS messages. To achieve interactivity, we develop an efficient and scalable online data cube engine that can query 55 million RAS logs in less than one second. We present several case studies on Mira, a top supercomputer at Argonne National Laboratory. The case studies demonstrate that La VALSE can help users quickly identify the sources of failure events and analyze spatiotemporal correlations of RAS messages in different scales.en_US
dc.description.sectionheadersSession 4
dc.description.seriesinformationEurographics Symposium on Parallel Graphics and Visualization
dc.identifier.doi10.2312/pgv.20181099
dc.identifier.isbn978-3-03868-054-3
dc.identifier.issn1727-348X
dc.identifier.pages91-100
dc.identifier.urihttps://doi.org/10.2312/pgv.20181099
dc.identifier.urihttps://diglib.eg.org:443/handle/10.2312/pgv20181099
dc.publisherThe Eurographics Associationen_US
dc.titleLa VALSE: Scalable Log Visualization for Fault Characterization in Supercomputersen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
091-100.pdf
Size:
2.77 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
1040-file1.mov
Size:
23.93 MB
Format:
Video Quicktime