ClusCTA: A Clustering technique based on Centroid Tracking for Data Streams

Sonia Jaramillo-Valbuena, Jorge Mario Londoño-Peláez, Sergio Augusto Cardona

    Producción científica: Contribución a una revistaArtículo en revista científica indexadarevisión exhaustiva

    1 Cita (Scopus)

    Resumen

    Many emerging applications generate high volume data streams. These data streams need to be processed in an online manner considering limited memory resources and strict time constraints. Thus data streams pose new challenges not present in classical machine learning techniques. They need to be modified, or new algorithms have to be devised that respond to their specific requirements. In particular, in this paper, we present a new clustering algorithm based on Centroid Tracking for data streams. The idea behind this algorithm is to model centroid movements and use this model to predict the next movements. The centroid movement model is updated with new stream samples, and only in the rare event of a significant quality loss, we fall back to a standard clustering algorithm. We compare our algorithm experimentally with a state of the art stream clustering algorithm called ClusTree and determine their robustness in the presence of noisy data. We conduct experiments based on real-world and synthetic datasets. The results show that the proposed approach has good performance.

    Idioma originalInglés
    Número de artículo25
    PublicaciónEspacios
    Volumen39
    N.º14
    EstadoPublicada - 2018

    Nota bibliográfica

    Publisher Copyright:
    © 2018.

    Huella

    Profundice en los temas de investigación de 'ClusCTA: A Clustering technique based on Centroid Tracking for Data Streams'. En conjunto forman una huella única.

    Citar esto