Gradient Boosting

Gradient Boosting is an approach in ensemble learning where models are trained in succession with adaptively weighted training sets to correct the errors of previous models. At its core, gradient boosting applies the principle of gradient descent to the functional space of predictive models.

Boosting algorithms typically use large populations of weak learners like shallow decision trees. Many variations of boosting have been introduced with different approaches to reweighting training data, with AdaBoost perhaps being the most popular. Consider a dataset {(x1,y1),...,(xn,yn)} where a set of weak classifiers {f1,...,fm} are combined to output a classification label yn{1,...,C} for each item. The boosted ensemble makes predictions using a linear combination of the individual classifiers weighted by αi.

F(xi)=α1f1(xi)+...+αnfm(xi)

Observation weights are initialized to wi=1n for each sample in the training set. Then, for each classifier in the ensemble of size m, a classifier fm is fit to the training data using the training set weights where the error is computed:

err(fm)=i=1nwiI(cifm(xi))i=1nwi

The classifier weight αm is then computed:

αm=log1err(fm)err(fm)+log(C1)

The observation weights are then updated to assign more importance to samples the previous classifier misclassified and then re-normalized such that the observation weights i=1nwi=1.

wiwiexp(αmI(yifm(xi)))