[Statlist] Next talk: July 2, 2013 with Rajen Shah, University of Cambridge, UK
Susanne Kaiser-Heinzmann
kaiser at stat.math.ethz.ch
Mon Jun 24 10:33:19 CEST 2013
_________________________________________________
ETH Zurich and University of Zurich
Organisers:
Proff. P. Bühlmann - L. Held - H.R. Kuensch - M. Maathuis - S. van de Geer - M. Wolf
*************************************************************************************
We are glad to announce the following talk
Tuesday, July 2, 2013 - 15.15h ETH Zurich HG G 19.1
with Rajen Shah, University of Cambridge, UK
**********************************************************************
Title:
Large-scale regression with sparse data
Abstract:
The "Big Data" era in which we are living has brought with it a combination of statistical and computational challenges that often must be met with approaches that draw on developments from both the fields of statistics and computer science. In this talk I will present a method for performing regression where the n by p design matrix may have both n and p in the millions, but where the design matrix is sparse, that is most of its entries are zero; such sparsity is common in many large-scale applications such as text analysis.
In this setting, performing regression using the original data can be computationally infeasible. Instead, we first map the design matrix to an n by L matrix with L << p, using a modified version of a scheme known as b-bit min-wise hashing in computer science. From a statistical perspective, we study the performance of regression using this compressed data, and give finite sample bounds on the prediction error. Interestingly, despite the loss of information through the compression scheme, we will see that ordinary least squares or ridge regression applied to the reduced data can actually allow us to fit a model containing interactions in the original data.
This is joint (and ongoing) work with Nicolai Meinshausen.
*******************************************************************************************************
This abstract is also to be found under the following link: http://stat.ethz.ch/events/research_seminar
More information about the Statlist
mailing list