Cross-validation techniques. Evaluating model performance is crucial in machine learning. Cross-validation stands out as a powerful technique for assessing Gradient Boosting models. In this comprehensive guide, we’ll explore data preparation, feature engineering, and the implementation of K-Fold Cross-Validation. Moreover, we’ll delve into standardizing features, calculating Mean Absolute Error (MAE), and visualizing model predictions to ensure robust model evaluation.
The Importance of Data Preparation
Before diving into cross-validation, it’s essential to properly prepare your data. First, let’s load our dataset and perform some initial preprocessing:
from datasets import load_dataset
import pandas as pd
Load dataset
tesla = load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])
Convert Date column to datetime type
tesla_df['Date'] = pd_to_datetime(tesla_df['Date'])
This code snippet loads the Tesla stock price dataset and converts the ‘Date’ column to the appropriate datetime format. Subsequently, we’ll move on to feature engineering.
Feature Engineering: Enhancing Your Dataset
Feature engineering plays a crucial role in improving model performance. Let’s add some technical indicators to our dataset:
# Feature Engineering
tesla_df['Target'] = tesla_df['Adj Close'].shift(-1) - tesla_df['Adj Close']
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()
# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)
In this step, we’ve added Simple Moving Averages (SMA) and Exponential Moving Averages (EMA) as technical indicators. These features can help capture trends in the stock price data.
Standardizing Features: Leveling the Playing Field
Feature standardization is a critical preprocessing step. It ensures all features are on the same scale, which can significantly improve model performance. Here’s how we standardize our features:
from sklearn.preprocessing import StandardScaler
# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Target'].values
# Standardizing features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
By using StandardScaler, we transform our features to have a mean of 0 and a standard deviation of 1. This step helps prevent features with larger magnitudes from dominating the model training process.
Implementing K-Fold Cross-Validation
Now that our data is prepared, let’s implement K-Fold Cross-Validation to evaluate our Gradient Boosting model:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
# Instantiate model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
# Perform cross-validation
scores = cross_val_score(model, features_scaled, target, cv=5, scoring='neg_mean_absolute_error')
# Convert negative mean absolute error to positive for easier interpretation
mean_score = -scores.mean()
print("Mean cross-validation score (Mean Absolute Error): ", mean_score)
In this code, we use 5-fold cross-validation to evaluate our Gradient Boosting model. The Mean Absolute Error (MAE) is used as the evaluation metric, providing a clear measure of the model’s prediction accuracy.
Interpreting the Mean Absolute Error
The Mean Absolute Error tells us the average absolute difference between predicted and actual values. A lower MAE indicates better predictive accuracy. For instance, if the MAE is 0.211, it suggests that, on average, the model’s predictions deviate from the actual values by approximately 0.211 units.
Visualizing Model Predictions
To gain deeper insights into our model’s performance, let’s visualize its predictions against the actual values:
import matplotlib.pyplot as plt
# Fit model to visualize predictions
model.fit(features_scaled, target)
predictions = model.predict(features_scaled)
# Plotting predictions vs actual values
plt.figure(figsize=(10, 6))
plt.scatter(range(len(target)), target, label='Actual', alpha=0.7)
plt.scatter(range(len(target)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values with Cross-Validation')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
This visualization allows us to compare the model’s predictions with the actual target values, providing a clear picture of where the model performs well and where it might need improvement.

Conclusion: The Power of Cross-Validation
In conclusion, cross-validation is an indispensable tool for evaluating Gradient Boosting models. By following the steps outlined in this guide – from data preparation and feature engineering to implementing K-Fold Cross-Validation and visualizing results – you can ensure a robust evaluation of your model’s performance. Remember, the key to successful model evaluation lies in thorough preparation, careful implementation, and insightful interpretation of results.
For more information on cross-validation techniques, check out this comprehensive guide from scikit-learn.
Discover more from teguhteja.id
Subscribe to get the latest posts sent to your email.

