User Tools

Site Tools


data_mining

Data Mining

Keywords : clustering, kernel methods, linear algebra, parallelization

Clustering aims at partitioning a p-dimensional data set in K compact and homogeneous clusters, with respect to some “similarity measure” between the elements in the given data set. Clustering has many applications in a large variety of fields : Biology, Pattern Recognition, Image Segmentations, geometric modeling…

Among the recent partitioning algorithms, kernel based methods, such as spectral clustering, offers the opportunity to cluster the data with arbitrary shape for the resulting clusters. The Spectral Clustering consists in creating, from the spectral elements of a Gaussian affinity matrix, a low-dimension space in which data are grouped into clusters. This unsupervised method is mainly based on Gaussian affinity measure, its parameter and its spectral elements. However, questions about the separability of clusters in the projection space and the spectral parameter choices remained open. The rule of the parameter of Gaussian affinity was first investigated through quality measures and two heuristics for choosing this setting were defined. Then, the method was studied by considering the spectral element of the Gaussian affinity matrix. By interpreting this matrix as the discretization of the heat kernel defined on the whole space and using finite elements, the eigenvectors of the affinity matrix are asymptotic representation of functions whose support is included in one connected component. These results helped to define the properties of clustering and conditions on the Gaussian parameter. As applications, Kinetic Spectral Clustering was applied for segmenting dynamic PET images in collaboration with INSERM, University of Tours, Universities Paris-7 and Paris-11 or in geometric modeling, kernel methods were used for detecting similarities on parametric surfaces in collaboration with IRIT-VORTEX team.

With emerging technologies in various applications, very large amount of input data needs to be incorporated and, to be efficient, clustering methods require high performance computing in general (including fine-tuned kernels for linear algebra and numerical optimization). So, parallelization strategies by decomposition into sub-domains were formulated and tested on geometrical and realistic examples.

Some clustering methods can be viewed as structural and sparse linear algebra is at the heart of innovative clustering methods to treat large scale simulation on high performance computers. Block structure identification in sparse matrices permits providing direct information on clusters in data.

data_mining.txt · Last modified: 2015/03/06 09:46 (external edit)

Page Tools