Gradient Boosting Regression Example in Python
The idea of gradient boosting is to improve weak learners and create a final combined prediction model. Decision trees are mainly used as base learners in this algorithm. The weak learner is identified by the gradient in the loss function. The prediction of a weak learner is compared to actual value and error is calculated. Based on this error, the model can determine the gradient and change the parameters to decrease the error rate in the next training.
I’ll try to predict regression data with the Gradient Boosting Regressor (comes in sklearn.ensemble module) class in Python.
Loading the Library
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
Preparing data
l have used Boston house-price dataset as regression dataset. After loading the dataset, first, we’ll separate data into x and y parts.
boston = load_boston()
x, y = boston.data, boston.target
Then we’ll split it into train and test parts. Here, we’ll extract 15 percent of the data as a test.
xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12,
test_size=0.15)
Defining the model
We can define the model with its default parameters or set the new parameter values.
# with new parameters
gbr = GradientBoostingRegressor(n_estimators=600,
max_depth=5,
learning_rate=0.01,
min_samples_split=3)
# with default parameters
gbr = GradientBoostingRegressor()
print(gbr)
GradientBoostingRegressor(alpha=0.9, criterion=’friedman_mse’, init=None,
learning_rate=0.1, loss=’ls’, max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, presort=’auto’, random_state=None,
subsample=1.0, verbose=0, warm_start=False)
#Next, we’ll fit the model with train data.
gbr.fit(xtrain, ytrain)
Predicting test data and visualizing the result
We can predict the test data and check the error rate as a following.
ypred = gbr.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print(“MSE: %.2f” % mse)
MSE: 10.41
Finally, we’ll visualize the original and predicted values in a plot.
x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color=”blue”, label=”original”)
plt.plot(x_ax, ypred, lw=0.8, color=”red”, label=”predicted”)
plt.legend()
plt.show()
show the graph
####
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12,
test_size=0.15)
# with new parameters
gbr = GradientBoostingRegressor(n_estimators=600,
max_depth=5,
learning_rate=0.01,
min_samples_split=3)
# with default parameters
gbr = GradientBoostingRegressor()
gbr.fit(xtrain, ytrain)
ypred = gbr.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print(“MSE: %.2f” % mse)
x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color=”blue”, label=”original”)
plt.plot(x_ax, ypred, lw=0.8, color=”red”, label=”predicted”)
plt.legend()
plt.show()