[Statlist] Séminaires de Statistique - Université de Neuchâtel
KONDYLIS Atanassios
atanassios.kondylis at unine.ch
Thu Jun 2 09:57:44 CEST 2005
Séminaires de Statistique
Mardi 21-06-2005 - 11h 00
Groupe de Statistique, Espace de l'Europe 4, Neuchâtel
Prof. Alfio Marazzi
Faculté de biologie et médecine, Université de Lausanne, Suissse
Robust response transformations based on optimal prediction
Response transformations have become a widely used tool to make data conform to a linear regression model. The most common example is the Box-Cox transformation. The transformed response is usually assumed to be linearly related to the covariates and the errors normally distributed with constant variance. The regression coefficients, as well as the parameter lambda defining the transformation, are generally estimated by maximum likelihood (ML). Unfortunately, near normality and homoscedasticity are hard to attain simultaneously with a single transformation. In addition, the ML-estimate is not consistent under non-normal or heteroscedastic errors and it is not robust.
Various semiparametric and nonparametric approaches to relax the parametric structure of the response distribution have been studied. However, these procedures do not provide effective protection against heavy contamination and heteroscedasticity. A first proposal of robust Box-Cox transformations for simple regression, which are robust and consistent even if the assumptions of normality and homoscedasticity do not hold, was given by Marazzi and Yohai in 2003.
Here, we present new estimates based on optimization of the prediction error. Our multiple regression model does not specify a parametric form of the error distribution. In order to develop a new nonparametric criterion, we introduce the basic concept of conditional M-expectation (CME), a robust version of the classical conditional expectation of the response for a given covariate vector. The CME minimizes a M-scale in place of the classical mean squared error. We then consider the CME of the transformed response as a function of lambda, the coefficients being estimated using a robust (e.g., MM-) estimator. The optimal prediction property of the CME provides a criterion to define the CME-estimate of lambda. Since the conditional mean of the response on the original scale is often the parameter of interest, we also provide a robust version of the well known smearing estimate which is consistent for the CME. Monte Carlo results show that the new estimators perform better than other available methods. Applications concerning modeling of hospital cost of stay with the help of covariates such as length of stay and admission type are presented.
Based on the joint work with Victor Yohai (Department of Mathematics, University of Buenos Aires)
More information about the Statlist
mailing list