TY - JOUR
T1 - ClusCTA
T2 - A Clustering technique based on Centroid Tracking for Data Streams
AU - Jaramillo-Valbuena, Sonia
AU - Londoño-Peláez, Jorge Mario
AU - Cardona, Sergio Augusto
N1 - Publisher Copyright:
© 2018.
PY - 2018
Y1 - 2018
N2 - Many emerging applications generate high volume data streams. These data streams need to be processed in an online manner considering limited memory resources and strict time constraints. Thus data streams pose new challenges not present in classical machine learning techniques. They need to be modified, or new algorithms have to be devised that respond to their specific requirements. In particular, in this paper, we present a new clustering algorithm based on Centroid Tracking for data streams. The idea behind this algorithm is to model centroid movements and use this model to predict the next movements. The centroid movement model is updated with new stream samples, and only in the rare event of a significant quality loss, we fall back to a standard clustering algorithm. We compare our algorithm experimentally with a state of the art stream clustering algorithm called ClusTree and determine their robustness in the presence of noisy data. We conduct experiments based on real-world and synthetic datasets. The results show that the proposed approach has good performance.
AB - Many emerging applications generate high volume data streams. These data streams need to be processed in an online manner considering limited memory resources and strict time constraints. Thus data streams pose new challenges not present in classical machine learning techniques. They need to be modified, or new algorithms have to be devised that respond to their specific requirements. In particular, in this paper, we present a new clustering algorithm based on Centroid Tracking for data streams. The idea behind this algorithm is to model centroid movements and use this model to predict the next movements. The centroid movement model is updated with new stream samples, and only in the rare event of a significant quality loss, we fall back to a standard clustering algorithm. We compare our algorithm experimentally with a state of the art stream clustering algorithm called ClusTree and determine their robustness in the presence of noisy data. We conduct experiments based on real-world and synthetic datasets. The results show that the proposed approach has good performance.
KW - Adaptive learning
KW - Classification
KW - Concept drift
KW - Data Stream Mining
UR - http://www.scopus.com/inward/record.url?scp=85045051084&partnerID=8YFLogxK
M3 - Artículo en revista científica indexada
AN - SCOPUS:85045051084
SN - 0798-1015
VL - 39
JO - Espacios
JF - Espacios
IS - 14
M1 - 25
ER -