Hyperparameter tuning is crucial for maximizing machine learning model performance. In this post, we’ll explore how to use GridSearchCV for optimizing a Gradient Boosting model to predict Tesla stock prices. By leveraging the power of hyperparameter tuning, we can significantly improve our model’s accuracy and make more reliable forecasts in the dynamic world of stock trading.
Understanding the Importance of Hyperparameter Optimization
First and foremost, let’s delve into why hyperparameter tuning is essential. In machine learning, hyperparameters are the settings that control how a model learns from data. Consequently, finding the right combination of these parameters can dramatically enhance model performance. GridSearchCV, a powerful tool in scikit-learn, automates this process by systematically working through multiple combinations of parameter tunes, cross-validating as it goes to determine which tune gives the best performance.
Preparing Your Tesla Stock Data
Before we dive into the tuning process, we need to prepare our dataset. Let’s start by loading and preprocessing the Tesla stock price data. Here’s how you can do it:
import pandas as pd
from datasets import load_dataset
# Load dataset
tesla = load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])
# Feature Engineering
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()
# Drop NaN values and prepare features
tesla_df.dropna(inplace=True)
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].shift(-1).dropna().values
features = features[:-1]
# Split the dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)
This code snippet loads the Tesla stock data, adds technical indicators, and prepares the features and target variables for our model. Subsequently, it splits the data into training and testing sets.
Setting Up the Hyperparameter Grid
Now that our data is ready, let’s set up the hyperparameter grid for our Gradient Boosting model. We’ll focus on tuning three key parameters: learning_rate, n_estimators, and max_depth. Here’s how to define the grid:
param_grid = {
'learning_rate': [0.01, 0.1],
'n_estimators': [100, 200],
'max_depth': [3, 4]
}
This grid will test different combinations of these parameters to find the optimal setup for our model.
Implementing GridSearchCV for Hyperparameter Tuning
With our grid defined, we can now implement GridSearchCV to find the best hyperparameters. Here’s the code to do so:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
model = GridSearchCV(GradientBoostingRegressor(random_state=42), param_grid, cv=3)
model.fit(X_train, y_train)
print("Best parameters found:", model.best_params_)
This code creates a GridSearchCV object, fits it to our training data, and prints the best parameters found. The cv=3 parameter specifies that we’ll use 3-fold cross-validation during the search process.
Evaluating Model Performance
After finding the optimal hyperparameters, it’s time to evaluate our model’s performance. We’ll use the best estimator to make predictions and calculate the Mean Squared Error:
best_model = model.best_estimator_
predictions = best_model.predict(X_test)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error with best params:", mse)
This code uses the best model found by GridSearchCV to make predictions on our test set and calculates the Mean Squared Error to assess the model’s accuracy.
Visualizing the Results
Finally, let’s visualize our predictions against the actual values to get a better understanding of our model’s performance:
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Tesla Stock Prices')
plt.xlabel('Sample Index')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
This visualization will help us see how closely our predictions match the actual Tesla stock prices.
Conclusion
In conclusion, hyperparameter tuning using GridSearchCV is a powerful technique for optimizing machine learning models. By applying this method to our Gradient Boosting model for Tesla stock price prediction, we’ve significantly improved our model’s performance. Remember, the key to successful hyperparameter tuning lies in understanding your data, choosing relevant parameters to tune, and interpreting the results effectively.
For more information on hyperparameter tuning and machine learning techniques, check out this comprehensive guide on GridSearchCV.
Discover more from teguhteja.id
Subscribe to get the latest posts sent to your email.