Visual Analytics for Fine-grained Text Classification Models and Datasets

dc.contributor.authorBattogtokh, Munkhtulgaen_US
dc.contributor.authorXing, Yiwenen_US
dc.contributor.authorDavidescu, Cosminen_US
dc.contributor.authorAbdul-Rahman, Alfieen_US
dc.contributor.authorLuck, Michaelen_US
dc.contributor.authorBorgo, Ritaen_US
dc.contributor.editorAigner, Wolfgangen_US
dc.contributor.editorArchambault, Danielen_US
dc.contributor.editorBujack, Roxanaen_US
dc.date.accessioned2024-05-21T08:18:57Z
dc.date.available2024-05-21T08:18:57Z
dc.date.issued2024
dc.description.abstractIn natural language processing (NLP), text classification tasks are increasingly fine-grained, as datasets are fragmented into a larger number of classes that are more difficult to differentiate from one another. As a consequence, the semantic structures of datasets have become more complex, and model decisions more difficult to explain. Existing tools, suited for coarse-grained classification, falter under these additional challenges. In response to this gap, we worked closely with NLP domain experts in an iterative design-and-evaluation process to characterize and tackle the growing requirements in their workflow of developing fine-grained text classification models. The result of this collaboration is the development of SemLa, a novel Visual Analytics system tailored for 1) dissecting complex semantic structures in a dataset when it is spatialized in model embedding space, and 2) visualizing fine-grained nuances in the meaning of text samples to faithfully explain model reasoning. This paper details the iterative design study and the resulting innovations featured in SemLa. The final design allows contrastive analysis at different levels by unearthing lexical and conceptual patterns including biases and artifacts in data. Expert feedback on our final design and case studies confirm that SemLa is a useful tool for supporting model validation and debugging as well as data annotation.en_US
dc.description.number3
dc.description.sectionheadersText and Speech
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume43
dc.identifier.doi10.1111/cgf.15098
dc.identifier.issn1467-8659
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.15098
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf15098
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Natural language processing; Human-centered computing → Visual analytics
dc.subjectComputing methodologies → Natural language processing
dc.subjectHuman
dc.subjectcentered computing → Visual analytics
dc.titleVisual Analytics for Fine-grained Text Classification Models and Datasetsen_US
Files
Original bundle
Now showing 1 - 4 of 4
No Thumbnail Available
Name:
v43i3_23_cgf15098.pdf
Size:
4.42 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
1074-i7.pdf
Size:
254.58 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
1074-i8.pdf
Size:
105.51 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
1074-i9.mp4
Size:
69.9 MB
Format:
Video MP4
Collections