[Statlist] Next Talk: Friday, May 21, 2010 with Christian Hennig, University College London

Mon May 17 09:06:11 CEST 2010

ETH and University of Zurich

Proff. P. Buehlmann - L. Held -
H.R. Kuensch - M. Maathuis - S. van de Geer

*********************************************************
We are glad to announce the following talk

*Friday, May 21, 2010 15.15 - 17.00 HG G 19.1 *
***********************************************************
with * Christian Hennig*, University College London

/Title: /
How to merge normal mixture components for cluster analysis

/Abstract: /
Normal mixture models are often used for cluster analysis. Usually, 
every component of the mixture is interpreted as a cluster. This, 
however, is often not appropriate. A mixture of two normal components 
can be unimodal and quite homogeneous. Particularly, mixtures of several 
normals can be needed to approximate homogeneous non-normal distributions.

Even if there are non-normal subpopulations in the data, the normal 
mixture model is still a good tool for clustering because of its 
flexibility. This presentation is about methods to decide whether, after 
having fitted a normal mixture, several mixture components should be 
merged in order to be interpreted as a single cluster.

Note that this cannot be formulated as a statistical estimation problem, 
because the likelihood and the general fitting quality of the model does 
not depend on whether single mixture components or sets of mixture 
components are interpreted as clusters. So any method depends on a 
specification of what the user wants to regard as a "cluster". There are 
at least two different cluster concepts, namely identifying clusters 
with modes (and therefore merging unimodal mixtures) and identifying 
clusters with clear patterns in the data
(which for example means that scale mixtures, though unimodal, should 
not necessarily be merged). Furthermore, it has to be specified how 
strong a separation is required between different clusters.

The methods proposed and compared in this presentation are all 
hierarchical. From an estimated mixture, pairs of components (and later 
pairs of already merged mixtures) are merged until members of a pair are 
separated enough in order to be interpreted as different clusters. This 
can be measured in many different ways, depending on the underlying 
cluster concept.

Apart from the discussed methodology, some implications about how to 
think about cluster analysis problems in general will be discussed.

This abstract is also to be found under the following link:
http://stat.ethz.ch/talks/research_seminar

-- 
ETH Zürich
Seminar für Statistik
Cecilia Rey-Lutz, HG G10.3
Rämistrasse 101                    
CH-8092 Zurich		                      	
mail: rey at stat.math.ethz.ch    	  		
phone: +41 44 632 3438/fax: +41 44 632 1228