Abstract :

Clustering is a ubiquitous technique in machine learning. Clustering is useful when we do not have labeled data. In the present study, three of the most useful and easy to implement clustering algorithms, namely k-means method, greedy-k-means method, and in the last but not the least is improved-k-means method have been studied. A study on the behavior of k-means clustering technique is being presented here. Next, we present discussion on two improved versions of k-means algorithm – in the first version, a greedy method is being applied to overcome some of the limitation whereas in the second version, using some pre-computation, we can improve the traditional k-means to some extent. While comparing the greedy version to the original to the original k-means method, our execution results suggests that the clustering quality of greedy version is better than original, but not more than that. We are not sure yet whether the size of the input dataset affects the clustering quality of the greedy version. We found that, among these three algorithms, the original k-means more-or-less performs best. While comparing the improved version to the original k-means method; the original version performs better than the improved version in most cases. For k <= 15, the improved version performs better. But as k > 15, the original version outperforms the improved version.