A Visual Approach for Text Analysis Using Multiword Topics

dc.contributor.authorMun, Seongminen_US
dc.contributor.authorDesagulier, Guillaumeen_US
dc.contributor.authorLee, Kyungwonen_US
dc.contributor.editorAnna Puig Puig and Tobias Isenbergen_US
dc.date.accessioned2017-06-12T05:17:57Z
dc.date.available2017-06-12T05:17:57Z
dc.date.issued2017
dc.description.abstractTopics in a text corpus include features and information; visualizing these topics can improve a user's understanding of the corpus. Topics can be broadly divided into two categories: those whose meaning can be described in one word and those whose meaning in expressed through a combination of words. The latter type can be described as multiword expressions and consists of a combination of different words. However, analysis of multiword topics requires systematic analysis to extract accurate topic results. Therefore, we propose a visual system that accurate extracts topic results with multiple word combinations. For this study, we utilize the text of 957 speeches from 43 U.S. presidents (from George Washington to Barack Obama) as corpus data. Our visual system is divided into two parts: First, our system refines the database by topic, including multiword topics. Through data processing, we systematically analyze the accurate extraction of multiword topics. In the second part, users can confirm the details of this result with a word cloud and simultaneously verify the result with the raw corpus. These two parts are synchronized and the desired value of N in the N-gram model, topics, and presidents examined can be altered. In this case study of U.S. presidential speech data, we verify the effectiveness and usability of our system.en_US
dc.description.sectionheadersPosters
dc.description.seriesinformationEuroVis 2017 - Posters
dc.identifier.doi10.2312/eurp.20171168
dc.identifier.isbn978-3-03868-044-4
dc.identifier.pages57-59
dc.identifier.urihttps://doi.org/10.2312/eurp.20171168
dc.identifier.urihttps://diglib.eg.org:443/handle/10.2312/eurp20171168
dc.publisherThe Eurographics Associationen_US
dc.subjectI.7.0 [Document And Text Processing]
dc.subjectGeneral
dc.subjectData Processing
dc.subjectH.5.2 [Information interfaces and presentation (e.g.
dc.subjectHCI)]
dc.subjectUser Interfaces
dc.subjectWeb
dc.subjectbased Interaction
dc.titleA Visual Approach for Text Analysis Using Multiword Topicsen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
057-059.pdf
Size:
896.86 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
eurovis2017-posters0127-file4.mp4
Size:
61.4 MB
Format:
Unknown data format