Discrete Mathematics and Optimization Seminar


JIMING PENG
McMasters University
Monday January 30th at 4pm
Burnside 1B39



Title. K-Means-Type Clustering, Principal Component Analysis and 0-1 SDP.

Abstract. One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares (MSSC),
which is known to be NP-hard. In this talk, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP).
We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral
clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming.

Secondly, we consider the issue of how to extract a feasible solution of the original 0-1 SDP model from the approximate solution of the relaxed SDP problem.
By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem.
In our rounding procedure, we need to solve a K-means clustering problem in \Re^{k-1}, which can be solved in O(n^{k^2-2k+2}) time. In case of bi-clustering,
the running time of our rounding procedure can be reduced to O(n\log n). We show that our algorithm can provide a 2-approximate solution to the original problem.

Promising numerical results for bi-clustering based on our new method are reported.