Linear Regression with SGD in Machine Learning
What is Linear Regression?
Linear Regression is a basic supervised learning algorithm in machine learning. Its job is to predict a continuous output (target value) from input features (variables). In simple words, it fits a straight line to the data points so that it can predict future values.
In mathematical form, the model is:
y = w * x + b
- y: Predicted value (target)
- x: Input feature
- w: Weight (slope of the line)
- b: Bias (intercept)
The goal here is to find the best w and b so that the predicted y is close to the actual y. For this, we use a loss function like Mean Squared Error (MSE):
MSE = (1/n) * Σ (y_actual – y_predicted)^2
We need to minimize this.
What is SGD (Stochastic Gradient Descent)?
Gradient Descent is an optimization technique that minimizes the loss function by iteratively updating the parameters (w and b). In normal (Batch) Gradient Descent, the entire dataset is used in every iteration, which can be slow for large datasets.
Stochastic Gradient Descent (SGD) is a variant of it. Here, in each iteration, only one random data point (sample) is used to calculate the gradient. This makes training faster, but due to noisy updates, convergence happens in a zig-zag manner, yet it gives good results overall on large datasets.
Update rule for SGD:
- Weight update: w = w – learning_rate * (gradient of loss w.r.t. w)
- Bias update: b = b – learning_rate * (gradient of loss w.r.t. b)
Learning rate is a hyperparameter that controls the step size.
How Does Linear Regression with SGD Work?
To train Linear Regression, we use SGD to minimize the loss (MSE). Unlike Batch GD, SGD updates on one sample at a time, which is efficient. In scikit-learn, SGDRegressor does exactly this – it trains a linear model using SGD.
Example
Suppose we have data of house size (in sq ft) and price (in lakhs):
- Size: 1000 sq ft, Price: 20 lakhs
- Size: 1500 sq ft, Price: 30 lakhs
- Size: 2000 sq ft, Price: 40 lakhs
- Size: 2500 sq ft, Price: 50 lakhs
We will train the model so that it can predict the price for a new size. SGD will randomly pick one sample, calculate the loss, and update w/b. After multiple epochs (iterations), the line will fit something like price = 0.02 * size + 0 (approx).
Now predict: For size 1800 sq ft, price ~36 lakhs.
Python Code Example
Here is a simple Python code using scikit-learn (a machine learning library). We will create data with numpy and use SGDRegressor.
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
# Sample data: House size (features) and price (target)
X = np.array([[1000], [1500], [2000], [2500]]) # Features (size in sq ft)
y = np.array([20, 30, 40, 50]) # Target (price in lakhs)
# Scale features (important for SGD)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Create SGD Regressor model
model = SGDRegressor(max_iter=1000, tol=1e-3, learning_rate='constant', eta0=0.01) # eta0 is learning rate
# Train the model
model.fit(X_scaled, y)
# Predict on new data
new_size = np.array([[1800]]) # New house size
new_size_scaled = scaler.transform(new_size)
predicted_price = model.predict(new_size_scaled)
print(f"Predicted price for 1800 sq ft: {predicted_price[0]} lakhs")
# Check MSE on training data
y_pred = model.predict(X_scaled)
mse = mean_squared_error(y, y_pred)
print(f"Mean Squared Error: {mse}")How to Run It?
- Install scikit-learn if not already:
pip install scikit-learn. - Run the code, you will get output like Predicted price ~36 lakhs, and low MSE.
In this code:
- Prepared the data.
- Scaled the features (necessary for SGD).
- Trained the model with SGD.
- Made predictions and checked error.
