Low-rank methods for multi-source, heterogeneous and incomplete data

Jeudi 23 janvier 14:00-15:00 - Geneviève Robin - CERMICS

Résumé : In modern applications of statistics and machine learning, the urge to collect large data sets often leads to relaxing acquisition procedures, and compounding diverse sources. As a result, analysts are confronted to many data imperfections. In particular, data are often heterogeneous, i.e. combine quantitative and qualitative information, incomplete, with missing values caused by machine failures or by the nonresponse phenomenon, and multi-source, when the data result from the aggregation of several data sets.
In this talk, I will present a general framework based on heterogeneous exponential family low-rank models, to analyse heterogeneous, multi-source and incomplete data sets. The theoretical results demonstrate that the method is simultaneously statistically sound—with minimax optimal estimation properties—and computationally efficient. I will illustrate the empirical behaviour of the method with the analysis of a North-African waterbirds monitoring data set.

Low-rank methods for multi-source, heterogeneous and incomplete data  Version PDF