Hands on guide to LeNet-5 (The Complete Info)

7 min readMar 13, 2023

LeNet is one of the oldest CNN models which paved the way for basis of convolution neural networks. Through this guide, We’ll be implementing custom-implemented LeNet-5 neural network architecture which will classify images from the MNIST dataset.

What to find in this article

Introduction to key concepts used in deep learning and machine learning.
Understanding of components which are used to construct a convolutional neural network.
Detailed explanation and implementation of LeNet-5 using Keras.

The CNN model was introduced in the paper “Gradient-Based Learning Applied To Document Recognition” in 1998 by Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. These authors were pioneers of modern deep learning and played a pivotal role in the field of AI.

Quick Fact: The model was originally designed for reading zip codes, sorting mail, and other automated mail handling tasks.

Convolutional Neural Networks

There are primarily three kinds of neural networks in deep learning that form the basis for most pre-trained models in deep learning:

ANNs (Artificial Neural Networks) are helpful for solving complex problems.
CNNs (Convolution Neural Networks) are best for solving Computer Vision-related problems.
RNNs (Recurrent Neural Networks) are proficient in Natural Language Processing Convolutional Neural Networks (CNNs) are the

Convolutional neural networks (CNNs) are specifically designed for image processing tasks, such as computer vision, and are widely used because of their ability to efficiently capture spatial information in images. In contrast, traditional artificial neural networks (ANNs) are not well-suited for image processing tasks because they treat each pixel in an image as a separate feature, and do not take into account the spatial relationships between adjacent pixels. Another reason to use CNN is the low amount of trainable parameters required.

LeNet-5

The leNet-5 network consists of seven layers: three convolutional layers, two pooling layers, and two fully connected layers.

The above image shows the architecture of the model as presented in the research paper. The first layer is the input layer, which is typically not considered a network layer because no information is learned at this level. The input layer is designed to accept 32x32 images, which are then passed on to the subsequent layer. Those who have prior experience with the MNIST dataset are aware that the images have dimensions of 28 by 28 pixels. The 28x28 photos are padded to fulfil the specifications of the input layer for the MNIST images.

The pixel values of the grayscale images used in the research report were transformed from 0 to 255 to -0.1 to 1.175. The purpose of normalisation is to ensure that each batch of images has a mean of 0 and a standard deviation of 1 which reduces training time. In the following example of image classification with LeNet-5, the pixel values of the images will be normalised to assume values between 0 and 1.

Since this is a CNN model, Convolutional layers and subsampling layers are utilised extensively in the LeNet-5 design.

In the study paper, convolutional layers are denoted by ‘Cx’ and subsampling layers are denoted by ‘Sx’, where ‘x’ represents the layer’s position in the architecture. Fx is used to identify fully connected layers.

The first layer is a convolutional layer called “C1,” which makes six feature maps with a kernel size of 5x5. The kernel/filter is the name for the window that holds the weight values that are used when the weight values and the input values are combined together. The size of the local receptive field of each neuron or unit in a convolutional layer is also shown by the number 5x5. Each of the six feature maps that the first convolution layer makes is made up of 28x28 pixels.

S2 is a subsampling layer that follows C1. Downsampling is the process by which the ‘S2’ layer halves the dimensions of the feature maps it gets from the preceding layer.

The ‘S2’ layer additionally generates six feature maps, each of which corresponds to the input feature maps from the preceding layer.

Complete Architecture (credits: www.analyticsvidhya.com)

LeNet-5 TensorFlow Implementation

In order to proceed with model implementation, we’ll need to import the following libraries

Keras: An open-source library Keras that provides a high-level interface for building and training neural networks.

import keras
from keras.datasets import mnist
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten
from keras.models import Sequential

The MNIST dataset is then imported using the Keras package. The Keras library provides an assortment of easily accessible datasets for machine learning. In order to proceed with deep learning, we must divide the dataset into test, validation, and training sets. Their brief descriptions are given below:

Training Dataset: This section of our dataset is used to directly train the neural network. Training data refers to the dataset partition that is made available to the neural network during training.

Validation Dataset: During training, this portion of the dataset is used to evaluate the performance of the network at various iterations.

Test Dataset: This portion of the dataset is used to test the performance of our network once the training phase has concluded.

For implementing our model, The pixel intensity of the images should be standardised from the range 0–255 to the range 0–1.

# Loading the dataset and perform splitting
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Peforming reshaping operation
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

In the code above, we increase the dimensions of both the training set and the test dataset. We do this because the network expects the images to be shown in batches during the training and evaluation phases.

The section of code listed below implements the real LeNet-5 neural network. In order to construct the model, We first have to assign the model to an instance of the tf.keras.Sequential. Sequrntial allows us to create a linear stack of layers. It is a convenient way to create a neural network model by simply adding layers one after another in a sequence.

model = Sequential()
model.add(Conv2D(6, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, kernel_size=(5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(120, activation='relu'))
model.add(Dense(84, activation='relu'))
model.add(Dense(10, activation='softmax'))

In the next line of code, we proceed to specify the model’s layers.

The C1 layer is defined by the line “Conv2D(6, kernel_size=(5, 5), activation=’relu’, input_shape=(28, 28, 1))”. The Conv2D layer performs a convolution operation on the input tensor using a set of learnable filters. The filters are applied to local regions of the input tensor, and the output of the convolution is a set of feature maps that represent the presence of certain features in the input. The remaining convolutional layers adhere to the same layer definition as C1 but with various argument values.

Activation Function: Activation functions are mathematical functions that are applied to the output of a neural network layer to introduce non-linearity into the network. They are a crucial component of deep learning models and are used to determine the output of each neuron in the layer.

Next step is adding average pooling layer into our network. The average pooling layer is typically used in convolutional neural networks (CNNs) to reduce the spatial dimensions of the feature maps generated by the convolutional layers, while retaining the most important features.

Average Pooling: It is a pooling operation that calculates the average value for patches of a feature map, and uses it to create a downsampled (pooled) feature map. It is usually used after a convolutional layer.

After these layers, we’ll be adding a flattening layer. Flattening is a process in Keras that transforms a multidimensional tensor into a one-dimensional tensor.

Flattening: The process of converting the data into a 1-dimensional array for inputting it to the next layer. We flatten the output of the convolutional layers to create a single long feature vector.

After flattening operation, We’ll be adding the hidden layers (dense layer). A dense layer, also known as a fully connected layer, is a type of neural network layer in Keras where each neuron in the layer is connected to every neuron in the previous layer.

Dense layers: They are used in neural networks for tasks such as classification and regression, where the goal is to map an input to an output by learning a non-linear function between them.

The number of classes in the MNIST dataset is equal to the number of units in the last dense layer. The output layer’s activation function is a softmax activation function.

Softmax: An activation function used to derive the probability distribution of a set of integers contained within an input vector. The output of a softmax activation function is a vector whose values represent the chance that a class/event will occur.
We can now compile and construct the model.

Now we can compile and build the model.

model.compile(loss=keras.metrics.categorical_crossentropy,
 optimizer=keras.optimizers.Adam(), metrics=['accuracy'])

Through the model object that we created earlier, Keras gives us the “compile” method. The compile function lets us build the model we’ve already set up in the background, along with some extra features like the loss function, optimizer, and metrics.

To train the network, we use a loss function that calculates the difference in predicted values vs known values.

When a loss value and an optimization algorithm (Adam) are used together, it makes it easier to change the weights in the network. Supporting factors like momentum and the learning rate schedule create the best conditions for the network training to converge, which brings the loss values as close to zero as possible.

During training, we’ll also check our model with the valuation dataset partition we made earlier after every epoch.

model.fit(x_train, y_train, batch_size=128, 
epochs=20, verbose=1, validation_data=(x_test, y_test))

When you’re done training, you’ll see that your model has a validation accuracy of over 90%. But to see how well the model works in real-life, we will test the trained model on the test dataset that the model has not seen.

score = model.evaluate(x_test, y_test)
>> [0.04592850968674757, 0.9745]

I was able to get 98% accuracy on the test dataset after training my model, which is pretty good for such a simple network.