Michigan Tech Publications

Scalable single linkage hierarchical clustering for big data

Timothy C. Havens, Michigan Technological University
James C. Bezdek, University of Melbourne
Marimuthu Palaniswami, University of Melbourne

Document Type

Conference Proceeding

Publication Date

8-9-2013

Abstract

Personal computing technologies are everywhere; hence, there are an abundance of staggeringly large data sets - the Library of Congress has stored over 160 terabytes of web data and it is estimated that Facebook alone logs nearly a petabyte of data per day. Thus, there is a pertinent need for systems by which one can elucidate the similarity and dissimilarity among and between groups in these big data sets. Clustering is one way to find these groups. In this paper, we extend the scalable Visual Assessment of Tendency (sVAT) algorithm to return single-linkage partitions of big data sets. The sVAT algorithm is designed to provide visual evidence of the number of clusters in unloadable (big) data sets. The extension we describe for sVAT enables it to also then efficiently return the data partition as indicated by the visual evidence. The computational complexity and storage requirements of sVAT are (usually) significantly less than the O(n2) requirement of the classic single-linkage hierarchical algorithm. We show that sVAT is a scalable instantiation of single-linkage clustering for data sets that contain c compact-separated clusters, where c ≪ n; n is the number of objects. For data sets that do not contain compact-separated clusters, we show that sVAT produces a good approximation of single-linkage partitions. Experimental results are presented for both synthetic and real data sets. © 2013 IEEE.

Publication Title

Proceedings of the 2013 IEEE 8th International Conference on Intelligent Sensors, Sensor Networks and Information Processing: Sensing the Future, ISSNIP 2013

Recommended Citation

Havens, T., Bezdek, J., & Palaniswami, M. (2013). Scalable single linkage hierarchical clustering for big data. Proceedings of the 2013 IEEE 8th International Conference on Intelligent Sensors, Sensor Networks and Information Processing: Sensing the Future, ISSNIP 2013, 1, 396-401. http://doi.org/10.1109/ISSNIP.2013.6529823
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p/10693

Link to Full Text

COinS

Michigan Tech Publications

Scalable single linkage hierarchical clustering for big data

Document Type

Publication Date

Abstract

Publication Title

Recommended Citation

LINKS

Browse

Search

Author Corner

Links

Michigan Tech Publications

Scalable single linkage hierarchical clustering for big data

Authors

Document Type

Publication Date

Abstract

Publication Title

Recommended Citation

Share

LINKS

Browse

Search

Author Corner

Links