A Visual Approach for Text Analysis Using Multiword Topics

Loading...
Thumbnail Image
Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Topics in a text corpus include features and information; visualizing these topics can improve a user's understanding of the corpus. Topics can be broadly divided into two categories: those whose meaning can be described in one word and those whose meaning in expressed through a combination of words. The latter type can be described as multiword expressions and consists of a combination of different words. However, analysis of multiword topics requires systematic analysis to extract accurate topic results. Therefore, we propose a visual system that accurate extracts topic results with multiple word combinations. For this study, we utilize the text of 957 speeches from 43 U.S. presidents (from George Washington to Barack Obama) as corpus data. Our visual system is divided into two parts: First, our system refines the database by topic, including multiword topics. Through data processing, we systematically analyze the accurate extraction of multiword topics. In the second part, users can confirm the details of this result with a word cloud and simultaneously verify the result with the raw corpus. These two parts are synchronized and the desired value of N in the N-gram model, topics, and presidents examined can be altered. In this case study of U.S. presidential speech data, we verify the effectiveness and usability of our system.
Description

        
@inproceedings{
10.2312:eurp.20171168
, booktitle = {
EuroVis 2017 - Posters
}, editor = {
Anna Puig Puig and Tobias Isenberg
}, title = {{
A Visual Approach for Text Analysis Using Multiword Topics
}}, author = {
Mun, Seongmin
 and
Desagulier, Guillaume
 and
Lee, Kyungwon
}, year = {
2017
}, publisher = {
The Eurographics Association
}, ISBN = {
978-3-03868-044-4
}, DOI = {
10.2312/eurp.20171168
} }
Citation