Machine Learning Methods in Visualisation for Big Data 2020
Permanent URI for this collection
Browse
Browsing Machine Learning Methods in Visualisation for Big Data 2020 by Issue Date
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item MLVis 2020: Frontmatter(The Eurographics Association, 2020) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plots(The Eurographics Association, 2020) Thrun, Michael C.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoFor many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored- Density plot (MDplot). The MDplot allows investigating the distributions of many variables with automatic sampling in case of large datasets. Statistical testing of clusterability is compared with MDplots of the 1st principal component and the distance distribution of data. Contradicting results are evaluated with topographic maps of cluster structures derived from planar projections using the generalized U-Matrix technique. A collection of artificial and natural datasets is used for the comparison. This collection is specially designed to have a variety of clustering problems that any algorithm should be able to handle. The results demonstrate that the MDplot improves statistical testing but, even then, almost touching cluster structures of low intercluster distances without a predominant direction of variance remain challenging.Item Progressive Multidimensional Projections: A Process Model based on Vector Quantization(The Eurographics Association, 2020) Ventocilla, Elio Alejandro; Martins, Rafael M.; Paulovich, Fernando V.; Riveiro, Maria; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoAs large datasets become more common, so becomes the necessity for exploratory approaches that allow iterative, trial-anderror analysis. Without such solutions, hypothesis testing and exploratory data analysis may become cumbersome due to long waiting times for feedback from computationally-intensive algorithms. This work presents a process model for progressive multidimensional projections (P-MDPs) that enables early feedback and user involvement in the process, complementing previous work by providing a lower level of abstraction and describing the specific elements that can be used to provide early system feedback, and those which can be enabled for user interaction. Additionally, we outline a set of design constraints that must be taken into account to ensure the usability of a solution regarding feedback time, visual cluttering, and the interactivity of the view. To address these constraints, we propose the use of incremental vector quantization (iVQ) as a core step within the process. To illustrate the feasibility of the model, and the usefulness of the proposed iVQ-based solution, we present a prototype that demonstrates how the different usability constraints can be accounted for, regardless of the size of a dataset.Item Visual Interpretation of DNN-based Acoustic Models using Deep Autoencoders(The Eurographics Association, 2020) Grósz, Tamás; Kurimo, Mikko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn the past few years, Deep Neural Networks (DNN) have become the state-of-the-art solution in several areas, including automatic speech recognition (ASR), unfortunately, they are generally viewed as black boxes. Recently, this started to change as researchers have dedicated much effort into interpreting their behavior. In this work, we concentrate on visual interpretation by depicting the hidden activation vectors of the DNN, and propose the usage of deep Autoencoders (DAE) to transform these hidden representations for inspection. We use multiple metrics to compare our approach with other, widely-used algorithms and the results show that our approach is quite competitive. The main advantage of using Autoencoders over the existing ones is that after the training phase, it applies a fixed transformation that can be used to visualize any hidden activation vector without any further optimization, which is not true for the other methods.Item ModelSpeX: Model Specification Using Explainable Artificial Intelligence Methods(The Eurographics Association, 2020) Schlegel, Udo; Cakmak, Eren; Keim, Daniel A.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoExplainable artificial intelligence (XAI) methods aim to reveal the non-transparent decision-making mechanisms of black-box models. The evaluation of insight generated by such XAI methods remains challenging as the applied techniques depend on many factors (e.g., parameters and human interpretation). We propose ModelSpeX, a visual analytics workflow to interactively extract human-centered rule-sets to generate model specifications from black-box models (e.g., neural networks). The workflow enables to reason about the underlying problem, to extract decision rule sets, and to evaluate the suitability of the model for a particular task. An exemplary usage scenario walks an analyst trough the steps of the workflow to show the applicability.Item Visual Analysis of the Impact of Neural Network Hyper-Parameters(The Eurographics Association, 2020) Jönsson, Daniel; Eilertsen, Gabriel; Shi, Hezi; Zheng, Jianmin; Ynnerman, Anders; Unger, Jonas; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoWe present an analysis of the impact of hyper-parameters for an ensemble of neural networks using tailored visualization techniques to understand the complicated relationship between hyper-parameters and model performance. The high-dimensional error surface spanned by the wide range of hyper-parameters used to specify and optimize neural networks is difficult to characterize - it is non-convex and discontinuous, and there could be complex local dependencies between hyper-parameters. To explore these dependencies, we make use of a large number of sampled relations between hyper-parameters and end performance, retrieved from thousands of individually trained convolutional neural network classifiers. We use a structured selection of visualization techniques to analyze the impact of different combinations of hyper-parameters. The results reveal how complicated dependencies between hyper-parameters influence the end performance, demonstrating how the complete picture painted by considering a large number of trainings simultaneously can aid in understanding the impact of hyper-parameter combinations.