A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA

Ohn Mar San;Van-Nam Huynh;Yoshiteru Nakamori

Journal of Systems Science & Complexity ›› 2003, Vol. 16 ›› Issue (4) : 562-571.

PDF(563 KB)
PDF(563 KB)
Journal of Systems Science & Complexity ›› 2003, Vol. 16 ›› Issue (4) : 562-571.
article

A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA

  • Ohn Mar San,Van-Nam Huynh,Yoshiteru Nakamori
Author information +
History +

Abstract

Most of the earlier work on clustering mainly focused on numeric data whose inherent geometric properties can be exploited to naturally define distance functions be-tween data points. However, data mining applications frequently involve many datasets that also consists of mixed numeric and categorical attributes. In this paper we present a clustering algorithm which is based on the fc-means algorithm. The algorithm clusters objects with numeric and categorical attributes in a way similar to Avmeans. The object similarity measure is derived from both numeric and categorical attributes. When applied to numeric data, the algorithm is identical to the &-means. The main result of this paper is to provide a method to update the "cluster centers" of clustering objects described by mixed numeric and categorical attributes in the clustering process to minimize the cluster-ing cost function. The clustering performance of the algorithm is demonstrated with the two well known daLa sets, namely credit approval and abalone databases.

Key words

Cluster analysis / numeric data / categorical data / k-means algorithm

Cite this article

Download Citations
Ohn Mar San , Van-Nam Huynh , Yoshiteru Nakamori. A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA. Journal of Systems Science and Complexity, 2003, 16(4): 562-571
PDF(563 KB)

83

Accesses

0

Citation

Detail

Sections
Recommended

/