Building a Deep Learning Model from Scratch in Python (Without any libraries)
This blog post explains how to build a multi-layer neural network (MLP) from scratch in Python. The goal of this model is to classify handwritten digits from the famous MNIST dataset. This exercise will guide you through building a neural network without relying on machine learning libraries like TensorFlow or PyTorch, but by relying solely on basic libraries such as NumPy. This approach will give you a deeper understanding of how neural networks operate under the hood.
1. Understanding the Problem
We aim to build a model that can classify handwritten digits (0 to 9) into one of 10 categories. The dataset consists of images of size 28x28 pixels, which will be flattened into vectors of length 784. The neural network's task is to map these input vectors to the correct digit labels.
The dataset consists of:
- Input features: 784 features (one for each pixel in the 28x28 image).
- Output: 10 classes (digits 0 to 9).
2. Importing Required Libraries
Before we begin building the model, we'll need some libraries for data handling and computations:
- NumPy: For matrix computations and operations.
- Pandas: To load and handle the dataset.
- Matplotlib: For visualizing data, especially displaying images.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import drive
3. Loading and Preprocessing the Data
The dataset is stored in CSV format, where each row corresponds to an image, and the first column contains the label (the digit). We load the data, shuffle it, and split it into training and testing sets. Here is how it's done:
# Load the data from Google Drive
data = pd.read_csv('/content/gdrive/My Drive/DigitDataSet/train.csv')
data = np.array(data)
# Shuffle the data for randomness
np.random.shuffle(data)
# Get the number of examples (m) and features (n)
m, n = data.shape
We then split the data into training and test sets:
# Split the data into train and test sets
m_test = 1000
m_train = m - m_test
# Test data
data_test = data[0 : m_test].transpose() # Transpose for easier manipulation
Y_test = data_test[0]
X_test = data_test / np.max(data_test) # Normalize the pixel values
X_test[0] = np.ones(m_test) # Add bias unit
# Train data
data_train = data[m_test : m].transpose()
Y_train = data_train[0]
X_train = data_train / np.max(data_train) # Normalize the pixel values
X_train[0] = np.ones(m_train) # Add bias unit
4. Defining the Neural Network Architecture
We will define the architecture of our neural network. The network consists of:
- Input layer: 784 neurons (one for each pixel in the image).
- Hidden layer: 10 neurons (chosen arbitrarily for simplicity).
- Output layer: 10 neurons (one for each digit class).
# Neural network architecture
n_hidden = 10
n_input = 784
n_output = 10
5. Defining Activation Functions
Activation functions are crucial for introducing non-linearity into the network. We'll use the following activation functions:
- ReLU (Rectified Linear Unit): Applied to the hidden layers to introduce non-linearity. Defined as:
ReLU function:
\( \text{ReLU}(z) = \max(0, z) \)
def ReLu(Z):
return np.maximum(Z, 0)
Sigmoid function:
\( \sigma(z) = \frac{1}{1 + e^{-z}} \)
def sigmoid(Z):
return 1 / (1 + np.exp(-Z))
6. Forward Propagation
In forward propagation, we compute the activations for each layer. The calculations for each layer \( l \) are as follows:
The linear transformation for a layer is:
\( Z^{(l)} = \theta^{(l)} \cdot A^{(l-1)} \)
\( A^{(l)} = g(Z^{(l)}) \)
Where:- \( \theta^{(l)} \) are the weights of layer \( l \),
- \( A^{(l-1)} \) is the activation from the previous layer,
- \( g(Z^{(l)}) \) is the activation function applied to the linear transformation.
def forward_prop(X, theta1, theta2):
Z2 = theta1.dot(X) # Linear transformation for hidden layer
A21 = ReLu(Z2) # Apply ReLU activation
A2 = np.ones((n_hidden + 1, m_train)) # Add bias unit
A2[1:n_hidden+1, :] = A21
Z3 = theta2.dot(A2) # Linear transformation for output layer
A3 = sigmoid(Z3) # Apply sigmoid activation
return A2, Z2, A3, Z3
7. Backward Propagation
Backward propagation calculates the gradients of the cost function with respect to the weights. We use the chain rule to calculate the error at each layer and propagate it backward:
\( \delta^{(output)} = (A^{(output)} - Y) \cdot g'(Z^{(output)}) \)
For the hidden layers:
\( \delta^{(l)} = \theta^{(l)} \cdot \delta^{(l+1)} \cdot g'(Z^{(l)}) \)
The gradients are used to update the weights during training. Here's the Python code for backward propagation:
def backward_prop(Z2, A2, Z3, A3, X, Y, theta2):
Y_convert_Y = Y_convert(Y)
# Output layer error
s_del3 = A3 - Y_convert_Y
temp4 = deriv_sigmoid(Z3)
s_del3 = np.multiply(s_del3, temp4)
# Hidden layer error
b_del2 = s_del3.dot(A2.transpose())
temp1 = (theta2.transpose()).dot(s_del3)
temp2 = temp1[1:n_hidden+1, :]
temp3 = deriv_ReLu(Z2)
s_del2 = np.multiply(temp2, temp3)
b_del1 = s_del2.dot(X.transpose())
return b_del1, b_del2
8. Gradient Descent
Gradient descent is used to minimize the cost function by updating the weights. The weight update rule is:
\( \theta^{(l)} = \theta^{(l)} - \alpha \cdot \frac{\partial J}{\partial \theta^{(l)}} \)
Where \( \alpha \) is the learning rate, and \( \frac{\partial J}{\partial \theta^{(l)}} \) is the gradient for layer \( l \). Here's the Python code for gradient descent:
def update_theta(theta1, theta2, b_del1, b_del2, alpha):
theta1 = theta1 - (1/m_train) * alpha * b_del1
theta2 = theta2 - (1/m_train) * alpha * b_del2
return theta1, theta2
def gradient_descent(X, Y, alpha, iterations):
theta1, theta2 = init_param()
for i in range(iterations):
A2, Z2, A3, Z3 = forward_prop(X, theta1, theta2)
b_del1, b_del2 = backward_prop(Z2, A2, Z3, A3, X, Y, theta2)
theta1, theta2 = update_theta(theta1, theta2, b_del1, b_del2, alpha)
if i % 10 == 0:
print("Iteration: ", i)
predictions = get_predictions(A3)
print(get_accuracy(predictions, Y))
return theta1, theta2
9. Model Training
We initialize the parameters and train the model using gradient descent:
# Optimizing the parameters
theta1, theta2 = gradient_descent(X_train, Y_train, 0.9, 500)
10. Evaluating the Model
Finally, we evaluate the trained model by calculating its accuracy on the test data:
def get_predictions(A3):
return np.argmax(A3, 0)
def get_accuracy(predictions, Y):
return np.sum(predictions == Y) / Y.size
No comments for "Building a Deep Learning Model from Scratch in Python (Without any libraries)"
Post a Comment