Cluster Validity for Fuzzy Text Segmentation
Document Type
Conference Proceeding
Publication Date
11-9-2023
Department
College of Computing; Department of Computer Science
Abstract
Topical text segmentation is an unsupervised learning process of separating documents, transcripts, and other text streams into segments-i.e., clusters-where the text in each segment is considered to be topically similar, and distinct from other segments. In this paper, we consider the task of fuzzy text segmentation, where words, or utterances, have shared membership in all segments. This is especially nascent for text sources like transcripts, where multiple topics are often simultaneously discussed: e.g., cost and deliverables in a sales meeting. One challenge in segmentation and clustering is how to choose the hyperparameters-e.g., number of clusters-in the algorithm. Hence, here we propose a fuzzy cluster validity metric, a modified Davies-Boudin index, and demonstrate how this index can be used to tune a fuzzy text segmentation algorithm. We demonstrate how fuzzy clustering can be used as a form of text segmentation and show some applications on benchmark data.
Publication Title
IEEE International Conference on Fuzzy Systems
ISBN
9798350332285
Recommended Citation
Lucas, E.,
&
Havens, T. C.
(2023).
Cluster Validity for Fuzzy Text Segmentation.
IEEE International Conference on Fuzzy Systems.
http://doi.org/10.1109/FUZZ52849.2023.10309734
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/358