[Statlist] Séminaires de Statistique - Université de Neuchâtel

Thu Jun 2 09:57:44 CEST 2005

Séminaires de Statistique 

Mardi 21-06-2005 - 11h 00 
Groupe de Statistique, Espace de l'Europe 4, Neuchâtel

Prof. Alfio Marazzi 
Faculté de biologie et médecine, Université de Lausanne, Suissse 

Robust response transformations based on optimal prediction
Response transformations have become a widely used tool to make data conform to a linear regression model. The most common example is the Box-Cox transformation. The transformed response is usually assumed to be linearly related to the covariates and the errors normally distributed with constant variance. The regression coefficients, as well as the parameter lambda defining the transformation, are generally estimated by maximum likelihood (ML). Unfortunately, near normality and homoscedasticity are hard to attain simultaneously with a single transformation. In addition, the ML-estimate is not consistent under non-normal or heteroscedastic errors and it is not robust.

Various semiparametric and nonparametric approaches to relax the parametric structure of the response distribution have been studied. However, these procedures do not provide effective protection against heavy contamination and heteroscedasticity. A first proposal of robust Box-Cox transformations for simple regression, which are robust and consistent even if the assumptions of normality and homoscedasticity do not hold, was given by Marazzi and Yohai in 2003.

Here, we present new estimates based on optimization of the prediction error. Our multiple regression model does not specify a parametric form of the error distribution. In order to develop a new nonparametric criterion, we introduce the basic concept of conditional M-expectation (CME), a robust version of the classical conditional expectation of the response for a given covariate vector. The CME minimizes a M-scale in place of the classical mean squared error. We then consider the CME of the transformed response as a function of lambda, the coefficients being estimated using a robust (e.g., MM-) estimator. The optimal prediction property of the CME provides a criterion to define the CME-estimate of lambda. Since the conditional mean of the response on the original scale is often the parameter of interest, we also provide a robust version of the well known smearing estimate which is consistent for the CME. Monte Carlo results show that the new estimators perform better than other available methods. Applications concerning modeling of hospital cost of stay with the help of covariates such as length of stay and admission type are presented.

Based on the joint work with Victor Yohai (Department of Mathematics, University of Buenos Aires)