Skip to content
Home » My Blog Tutorial » Implementing Bagging with Decision Trees: A Python Adventure

Implementing Bagging with Decision Trees: A Python Adventure

Ensemble Methods from Scratch

Bagging, bootstrap aggregating, and decision trees form the core of our Python adventure today. We’ll explore how to implement bagging with decision trees, a powerful ensemble learning technique that reduces model variance. This blog post will guide you through the process, showcasing Python code and explaining key concepts along the way.

Understanding the Bagging Technique

Bagging, short for bootstrap aggregating, is a method that creates multiple subsets from the original dataset. These subsets, chosen with replacement, train separate models. The final prediction aggregates results from individual models, essentially voting for the best answer. This technique effectively reduces variance in machine learning models.

Decision Trees: Our Base Models

We’ll use decision trees as our base models for this implementation. Decision trees support both categorical and continuous input variables, following sequential, hierarchical decision rules to output a final decision. Their versatility makes them an excellent choice for our bagging adventure.

Python Implementation: Setting the Stage

Let’s dive into the Python code that brings our bagging implementation to life. We’ll start by importing necessary libraries and preparing our data:

import numpy as np
from scipy import stats
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Load the iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Set the number of models
n_models = 100
random_states = list(range(n_models))

This code sets up our environment, loads the iris dataset, and prepares our training and test sets. We’ve chosen to create 100 decision tree models for our bagging ensemble.

Implementing the Bagging Algorithm

Now, let’s implement the core of our bagging algorithm:

def bootstrapping(X, y):
  n_samples = X.shape[0]
  idxs = np.random.choice(n_samples, n_samples, replace=True)
  return X[idxs], y[idxs]

def predict(X, models):
  predictions = np.array([model.predict(X) for model in models])
  return stats.mode(predictions)[0]

tree_models = []

for i in range(n_models):
  X_, y_ = bootstrapping(X_train, y_train)
  tree = DecisionTreeClassifier(max_depth=2, random_state=random_states[i])
  tree.fit(X_, y_)
  tree_models.append(tree)

y_pred = predict(X_test, tree_models)
print("Accuracy: ", accuracy_score(y_test, y_pred))

This code defines our bootstrapping and prediction functions, creates and trains our decision tree models, and finally makes predictions on the test set. The accuracy score gives us a measure of our model’s performance.

Evaluating Our Bagging Model

The accuracy score, calculated using sklearn’s accuracy_score() function, tells us how well our bagging model performs. This metric represents the ratio of correct predictions to the total number of predictions, providing a clear measure of our model’s effectiveness.

Wrapping Up Our Bagging Adventure

Congratulations! You’ve successfully implemented bagging with decision trees in Python. This powerful ensemble technique can significantly improve model performance by reducing variance. As you continue your machine learning journey, remember that bagging is just one of many ensemble methods you can explore.

For more information on ensemble learning techniques, check out this comprehensive guide on ensemble methods.


Discover more from teguhteja.id

Subscribe to get the latest posts sent to your email.

Leave a Reply

Optimized by Optimole
WP Twitter Auto Publish Powered By : XYZScripts.com

Discover more from teguhteja.id

Subscribe now to keep reading and get access to the full archive.

Continue reading