XGBoost | Machine Learning

Written by- AionlinecourseMachine Learning Tutorials

XGBoost in Python Step 1: First of all, we have to install the XGBoost. Now, we need to implement the classification problem. In this problem, we classify the customer into two classes and who will leave the bank and who will not leave the bank. Now, we import the library and we import the dataset churn Modeling csv file. So, we just want to preprocess the data for this churn modeling problem associated with this churn modeling CSV file. Here, XGboost is a great and boosting model with decision trees according to the feature skilling. After building the model, we can understand, XGBoost is so popular because of three qualities, the first quality is high performance and the second quality is fast execution speed. Now, we split the dataset into the training set and testing set. You will get the python code in Google Colab also.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Encoding categorical data

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.compose import ColumnTransformer
# Country column
ct = ColumnTransformer([("Country", OneHotEncoder(), [1]), ("Gender", OrdinalEncoder(), [2])], remainder = 'passthrough')
X = ct.fit_transform(X)

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


XGBoost in Python Step 2: In this tutorial, we gonna fit the XSBoost into the training set. Now, we apply the xgboost library and import the XGBClassifier.Now, we apply the classifier object. And we call the XGBClassifier class. Now, we apply the fit method. Now, we execute this code. Now, we apply the confusion matrix. And we also predict the test set result. And we applying the k fold cross validation code. Now, we execute this code. After executing this code, we get the dataset. Then we get the confusion matrix, where we get the 1521+208 correct prediction and 197+74 incorrect prediction. And we get this accuracy of 86%. After executing the mean function, we get 86%.

from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)


# Predicting the Test set results

y_pred = classifier.predict(X_test)


# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


# Applying k-Fold Cross Validation

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
accuracies.std()