Title:
Analysis of some purely random forests.
Abstract:
Random forests (Breiman, 2001) are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem.
As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of Breiman's random forests.
In the regression framework, the quadratic risk of a purely random forest can be written as the sum of two terms, which can be understood as an approximation error and an estimation error. Robin Genuer (2010) studied how the estimation error decreases when the number of trees increases for some specific model. In this talk, we study the approximation error (the bias) of some purely random forest models in a regression framework, focusing in particular on the influence of the size of each tree and of the number of trees in the forest.
Under some regularity assumptions on the regression function, we show that the bias of an infinite forest decreases at a faster rate (with respect to the size of each tree) than a single tree. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees.
This talk is based on joint works with Robin Genuer.
http://arxiv.org/abs/1407.3939
http://arxiv.org/abs/1604.01515