Scalable approximation of kernel fuzzy c-means

Document Type

Conference Proceeding

Publication Date

12-1-2013

Abstract

Virtually every sector of business and industry that use computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are highly desired to deduce the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer's memory (this volume changes based on the computer used or available). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N > > n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and the complexity of the FCM algorithm. Empirical results show that stKFCM, even with very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup. © 2013 IEEE.

Publication Title

Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Share

COinS