Semi supervised ensemble clustering based on selected constraint projection in Python

Semi supervised ensemble clustering based on selected constraint projection in Python

Abstract:

Traditional cluster ensemble approaches have several limitations. (1) Few make use of prior knowledge provided by experts. (2) It is difficult to achieve good performance in high-dimensional datasets. (3) All of the weight values of the ensemble members are equal, which ignores different contributions from different ensemble members. (4) Not all pairwise constraints contribute to the final result. In the face of this situation, we propose double weighting semi-supervised ensemble clustering based on selected constraint projection(DCECP) which applies constraint weighting and ensemble member weighting to address these limitations. Specifically, DCECP first adopts the random subspace technique in combination with the constraint projection procedure to handle high-dimensional datasets. Second, it treats prior knowledge of experts as pairwise constraints, and assigns different subsets of pairwise constraints to different ensemble members. An adaptive ensemble member weighting process is designed to associate different weight values with different ensemble members. Third, the weighted normalized cut algorithm is adopted to summarize clustering solutions and generate the final result. Finally, nonparametric statistical tests are used to compare multiple algorithms on real-world datasets. Our experiments on 15 high-dimensional datasets show that DCECP performs better than most clustering algorithms.