Abstract:
It is a challenge to generate an accurate machine learning model in a distributed network due to the increased concern in data privacy and high cost in gathering all raw data. This paper presents an adaptive asynchronous distributed clustering algorithm and two centralised methods for agents in wireless network to learn the global models, while the privacy is protected. Moreover, the communication cost and clustering quality can be adaptively balanced. The proposed clustering algorithms do not require the number of clusters to be pre-defined, and we propose a bounding boxes based method to fully utilize the shape information of clusters to improve the accuracy of the global model. Furthermore, we consider different knowledge levels of agents and different requirements about the global model. In experiments on randomly generated network topologies, we demonstrate that methods which do all the iterations of clustering in each cycle, and which exchange descriptions of cluster shape and density instead of just centroids and data counts, achieve higher accuracy, in significantly shorter elapsed time.