Towards community detection in locally heterogeneous networks
Charu C. Aggarwal, Yan Xie, et al.
SDM 2011
The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One way of handling this is to pick the closely correlated dimensions and find clusters in the corresponding subspace. Traditional feature selection algorithms attempt to achieve this. The weakness of this approach is that in typical high dimensional data mining applications different sets of points may cluster better for different subsets of dimensions. The number of dimensions in each such cluster-specific subspace may also vary. Hence, it may be impossible to find a single small subset of dimensions for all the clusters. We therefore discuss a generalization of the clustering problem, referred to as the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves. We develop an algorithmic framework for solving the projected clustering problem, and test its performance on synthetic data.
Charu C. Aggarwal, Yan Xie, et al.
SDM 2011
Charu C. Aggarwal, Yao Li, et al.
ICDM 2011
Avraham Leff, Joel L. Wolf, et al.
LCN 1992
Charu C. Aggarwal
SDM 2007