TensorFlow Image Classification¶

Importing Necessary Libraries¶

In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import datasets
from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout, GlobalMaxPooling2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import Model

Loading the CIFAR-10 Dataset¶

In [3]:
(X_train, y_train) , (X_test, y_test) = datasets.cifar10.load_data()

Training Dataset¶

In [27]:
X_train.shape
Out[27]:
(50000, 32, 32, 3)
In [28]:
y_train.shape
Out[28]:
(50000, 1)
In [4]:
y_train = y_train.reshape(-1,) 

Testing Dataset¶

In [30]:
X_test.shape
Out[30]:
(10000, 32, 32, 3)
In [31]:
y_test.shape
Out[31]:
(10000, 1)

Dataset Description¶

CIFAR-10 Dataset¶

The CIFAR-10 dataset is a widely used image classification dataset. It consists of 60,000 32x32 color images in 10 different classes, with each class containing 6,000 images. This dataset is a subset of the larger CIFAR-100 dataset, focusing on 10 mutually exclusive classes. The CIFAR-10 dataset is often used for training and evaluating machine learning and deep learning models for image classification tasks.

Dataset Classes¶

The CIFAR-10 dataset is divided into the following 10 classes:

  1. Airplane
  2. Automobile
  3. Bird
  4. Cat
  5. Deer
  6. Dog
  7. Frog
  8. Horse
  9. Ship
  10. Truck

Dataset Split¶

The dataset is typically split into two sets:

  • Training Set: The training set consists of 50,000 images (5,000 images per class) used for training machine learning models.
  • Test Set: The test set comprises 10,000 images (1,000 images per class) used for evaluating the performance of trained models.

Dataset Characteristics¶

  • Image Size: All images in the CIFAR-10 dataset are of size 32x32 pixels.
  • Color Channels: Images are in color, containing three color channels: red, green, and blue (RGB).
  • Labeling: Each image is labeled with one of the 10 class labels.

Use Cases¶

The CIFAR-10 dataset is a standard benchmark for image classification tasks. It is often used for various machine learning and deep learning applications, including but not limited to:

  • Convolutional Neural Network (CNN) training and evaluation.
  • Image classification research and benchmarking.
  • Computer vision tasks.
  • Research and development of image recognition algorithms.

Exploring and Analyzing the Dataset¶

Creating the Plot Sample Picture Function¶

Classes from the Dataset¶

In [32]:
classes = [
    "Airplane",
    "Automobile",
    "Bird",
    "Cat",
    "Deer",
    "Dog",
    "Frog",
    "Horse",
    "Ship",
    "Truck"
]

the Plot Sample Picture Function¶

In [ ]:
def plot_sample(index, X=X_train,y=y_train):
    plt.figure(figsize = (15,2))
    plt.imshow(X[index])
    plt.xlabel(classes[y[index]])

In this code, I defined a function that allows me to visualize a sample from a dataset along with its corresponding class label. Let me explain why I created this function:

  1. Function Purpose: The purpose of this function is to plot a single sample from a dataset and label it with its corresponding class name. It's particularly useful when working with image classification tasks, where you want to inspect individual samples.

  2. Parameters:

    • index: This parameter specifies the index of the sample I want to visualize.
    • X=X_train: The function takes an optional parameter X, which is typically a dataset containing the samples. In this case, it is set to X_train, which is the training dataset.
    • y=y_train: Similarly, the function takes an optional parameter y, which represents the labels corresponding to the samples. It is set to y_train, which contains the training labels.
  3. Plotting the Sample:

    • plt.figure(figsize=(15, 2)): This line sets the figure size for the plot, making it 15 units wide and 2 units tall. This helps ensure the sample is displayed with the desired dimensions.
    • plt.imshow(X[index]): Here, I use plt.imshow() to display the image sample at the specified index (X[index]). This function is often used for showing images.
    • plt.xlabel(classes[y[index]]): This line sets the label for the x-axis of the plot. It labels the image with the class name corresponding to the label y[index] using the classes list.

The reason for creating this function is to simplify the process of visualizing individual samples from a dataset. It's especially helpful during the development and debugging phases of machine learning and deep learning projects when you want to verify that the data is being loaded and processed correctly. Additionally, it provides a quick way to inspect the training data and check if it matches the corresponding class labels. This function enhances the efficiency of working with image datasets and aids in understanding the data you're working with.ta you're working with.

Exploring 10 Random Sample Picture¶

In [34]:
import random

# Choose 10 random indices
random_indices = random.sample(range(len(X_train)), 10)

# Plot the random samples
for i in random_indices:
    plot_sample(i)

Normalizing Dataset¶

In [6]:
# Reduce the pixel values
X_train, X_test = X_train / 255.0, X_test / 255.0

# Flatten the label values
y_train, y_test = y_train.flatten(), y_test.flatten()

I wrote this code to prepare the image data for training a machine learning model, specifically a neural network. Here's why I included this code and what it does:

  1. Pixel Value Normalization:

    X_train, X_test = X_train / 255.0, X_test / 255.0
    
    • I performed pixel value normalization by dividing all pixel values in the training (X_train) and testing (X_test) datasets by 255.0.
    • This operation scales down the pixel values from the original range of 0 to 255 to a new range of 0.0 to 1.0.
    • Normalizing pixel values is a common preprocessing step in deep learning. It helps improve convergence during training and ensures that the model is not overly sensitive to the scale of the input data. This is important because neural networks are sensitive to the magnitude of their input features.
  2. Label Flattening:

    y_train, y_test = y_train.flatten(), y_test.flatten()
    
    • I flattened the label arrays y_train and y_test. The labels were originally in a multi-dimensional format.
    • This flattening operation transforms the label arrays from a multi-dimensional structure into one-dimensional arrays.
    • This is typically necessary when dealing with classification tasks, as many machine learning and deep learning models expect labels to be in a one-dimensional format.

I performed these operations to ensure that both the input data and labels are in a format that can be easily used by machine learning models. Normalizing pixel values is crucial for model stability and convergence, while flattening labels simplifies the data structure, making it compatible with a wide range of machine learning algorithms. It's a standard preprocessing step to set up the data for training and testing in a consistent and machine-friendly format.

Creating the Machine Learning Model¶

Calculating the Number of Output Layer Based on the Number of Classes¶

In [7]:
# number of classes
K = len(set(y_train))

# Calculate the number of classes for output layer
print("The number of classes:", K)
The number of classes: 10

I wrote this code to determine the number of classes in the dataset and print this information. I included this code snippet to identify and confirm the number of classes in the dataset. Here's why it's important:

  1. Counting the Classes:

    K = len(set(y_train))
    
    • I calculated K by finding the length of a set created from the y_train array. In other words, I counted the unique class labels present in the training data.
  2. Printing the Number of Classes:

    print("The number of classes:", K)
    
    • After determining K, I printed it out with a description for clarity.

Why I Did It:

  • Knowing the number of classes in a classification problem is fundamental. It helps me understand the scope of the task and ensures that I'm working with the correct number of output units in the neural network's output layer.
  • Different classification problems have a different number of classes. For example, a binary classification task has two classes (0 and 1), while a multiclass classification task can have more. By counting the classes, I can adapt my neural network architecture to have the appropriate number of output neurons.
  • Additionally, this information is essential for tasks like one-hot encoding the target labels and configuring the output layer's activation function and units. It's a crucial step in setting up the neural network for accurate predictions on the given classification task.

In summary, this code helps me understand the dataset's classification structure and ensures that I configure the neural network correctly for the specific number of classes in the problem.

Building the Machine Learning Model using the functional API¶

Input Layer¶

In [8]:
i = Input(shape = X_train[0].shape)
x = Conv2D(32, (3,3), activation = 'relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)

x = Conv2D(64, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)

x = Conv2D(128, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)

x = Flatten()(x)
x = Dropout(0.2)(x)

I wrote this code to define the architecture of the convolutional neural network (CNN) model. Let me explain why I did that:

Input Layer
i = Input(shape = X_train[0].shape)
  • I created an input layer, which represents the shape of the input data. In this case, X_train[0].shape defines the shape of a single input sample. It's essential to specify the input shape for the model to know what to expect.
x = Conv2D(32, (3,3), activation = 'relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)
  • These lines define the initial convolutional layers of the model. I used Conv2D layers to perform convolutions on the input data, followed by Rectified Linear Unit (ReLU) activation functions for introducing non-linearity. I also added BatchNormalization layers to improve training stability. MaxPooling2D layers reduce spatial dimensions, helping in feature extraction.
x = Conv2D(64, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)
  • I repeated the convolutional block with more filters to capture higher-level features.
x = Conv2D(128, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3,3), activation = 'relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2,2))(x)
  • The final block further increases the depth of the model to learn more complex patterns in the data.
x = Flatten()(x)
x = Dropout(0.2)(x)
  • After the convolutional layers, I used a Flatten layer to transform the 2D feature maps into a 1D vector. Then, I added a Dropout layer with a rate of 0.2 to prevent overfitting by randomly deactivating some neurons during training.

Why I did it:

  • I created this architecture to build a deep CNN model for image classification. Convolutional layers are excellent at capturing hierarchical features in images, making them suitable for tasks like object recognition.

  • The addition of BatchNormalization and Dropout layers helps with training stability and regularization, respectively.

  • By designing the model in this way, I aimed to leverage the power of deep learning to extract intricate features from the input data, which is particularly useful for image classification tasks. This architecture is a common choice for such tasks and can be further customized depending on the specific problem and dataset.

Hidden Layer¶

In [9]:
x = Dense(1024, activation= 'relu')(x)
x = Dropout(0.2)(x)

I included this code to define the hidden layers of my neural network. Let me explain why I did that:

x = Dense(1024, activation= 'relu')(x)
  • I added a Dense layer with 1024 units and a ReLU (Rectified Linear Unit) activation function. This layer is crucial for learning complex patterns and representations from the features extracted by the convolutional layers.
x = Dropout(0.2)(x)
  • I introduced a Dropout layer with a rate of 0.2 immediately after the Dense layer. Dropout is a regularization technique that helps prevent overfitting. It randomly deactivates 20% of the neurons during training, which encourages the network to learn more robust and generalized features.

Why I did it:

  • The Dense layer is used to create a fully connected layer in the neural network. It allows the model to learn intricate patterns and relationships in the data.

  • I chose a ReLU activation function because it's effective in training deep neural networks and helps with the vanishing gradient problem.

  • The Dropout layer is essential for regularization. Overfitting is a common concern in deep learning, and Dropout helps mitigate it by reducing the network's reliance on any specific set of neurons, making the model more generalizable to unseen data.

  • This configuration of hidden layers is a common choice for deep neural networks, particularly in image classification tasks. It strikes a balance between complexity and regularization, helping to achieve good performance on a variety of datasets. However, it can be further adjusted and tuned depending on the specific problem and dataset characteristics.

Output Layer¶

In [10]:
x = Dense(K, activation='softmax')(x)

I added this code to define the output layer of my neural network, and I'll explain why:

x = Dense(K, activation='softmax')(x)
  • I created a Dense layer with K units, where K represents the number of classes or categories in my classification problem. In this case, it's essential to set the number of units in the output layer to match the number of classes I want to predict.

  • I used the softmax activation function for the output layer. Softmax is commonly used in multi-class classification problems. It calculates the probability distribution over all classes, making it suitable for determining which class the input data belongs to.

Why I did it:

  • The output layer of a neural network is responsible for producing the final predictions. In a classification problem, the number of units in this layer corresponds to the number of possible classes, ensuring that the network provides a prediction for each class.

  • The softmax activation function is ideal for multi-class classification because it normalizes the output values, converting them into class probabilities. This means that the predicted values will sum to 1, and I can interpret the highest probability as the predicted class.

  • By configuring the output layer in this way, I'm preparing my neural network for a multi-class classification task, and it will generate class probabilities that I can use to make predictions and evaluate the model's performance.

the Model Summary¶

In [11]:
model = Model(i, x)
model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d (Conv2D)             (None, 32, 32, 32)        896       
                                                                 
 batch_normalization (BatchN  (None, 32, 32, 32)       128       
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        9248      
                                                                 
 batch_normalization_1 (Batc  (None, 32, 32, 32)       128       
 hNormalization)                                                 
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 batch_normalization_2 (Batc  (None, 16, 16, 64)       256       
 hNormalization)                                                 
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 64)        36928     
                                                                 
 batch_normalization_3 (Batc  (None, 16, 16, 64)       256       
 hNormalization)                                                 
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 8, 8, 128)         73856     
                                                                 
 batch_normalization_4 (Batc  (None, 8, 8, 128)        512       
 hNormalization)                                                 
                                                                 
 conv2d_5 (Conv2D)           (None, 8, 8, 128)         147584    
                                                                 
 batch_normalization_5 (Batc  (None, 8, 8, 128)        512       
 hNormalization)                                                 
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 4, 4, 128)        0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 2048)              0         
                                                                 
 dropout (Dropout)           (None, 2048)              0         
                                                                 
 dense (Dense)               (None, 1024)              2098176   
                                                                 
 dropout_1 (Dropout)         (None, 1024)              0         
                                                                 
 dense_1 (Dense)             (None, 10)                10250     
                                                                 
=================================================================
Total params: 2,397,226
Trainable params: 2,396,330
Non-trainable params: 896
_________________________________________________________________

The model summary provides a detailed breakdown of the neural network architecture, its layers, and the number of parameters used in each layer.

  • Input Layer (input_1): This layer is designed to accept images with a shape of (32, 32, 3), which corresponds to 32x32-pixel images with three color channels (red, green, and blue).

  • Convolutional Layers (conv2d, conv2d_1, conv2d_2, conv2d_3, conv2d_4, conv2d_5): These layers are responsible for learning features from the input images. They employ convolution operations to detect various patterns and features in the images. The output shapes vary, and each convolutional layer has its own set of parameters.

  • Batch Normalization Layers (batch_normalization, batch_normalization_1, batch_normalization_2, batch_normalization_3, batch_normalization_4, batch_normalization_5): Batch normalization is used to normalize the activations of each layer, which helps improve training efficiency and reduces the risk of vanishing/exploding gradients.

  • Max Pooling Layers (max_pooling2d, max_pooling2d_1, max_pooling2d_2): These layers perform max-pooling operations to down-sample the feature maps and reduce the spatial dimensions. This helps in preserving important features while reducing computational complexity.

  • Flatten Layer (flatten): This layer is responsible for converting the 2D feature maps into a 1D vector, preparing the data for the fully connected layers.

  • Dropout Layers (dropout, dropout_1): These layers are used for regularization by randomly setting a fraction of input units to 0 during training, which helps prevent overfitting.

  • Dense Layers (dense, dense_1): These fully connected layers perform the final classification. The last dense layer has 10 units, matching the number of classes in the CIFAR-10 dataset. It uses the softmax activation function to generate class probabilities.

  • Total Parameters: The model contains a total of 2,397,226 parameters, which include weights and biases. These parameters are learned during training to make the model capable of recognizing and classifying images.

  • Trainable Parameters: Out of the total parameters, 2,396,330 are trainable, which means they are updated during training to optimize the model's performance.

  • Non-trainable Parameters: There are 896 non-trainable parameters, which are typically used in certain layers for internal computations and are not updated during training.

This model summary provides a comprehensive overview of the network's architecture and parameter count, giving insights into the complexity of the machine learning model used for image classification.

Compiling the Machine Learning Model¶

In [12]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

I added this code to compile my neural network, and I'll explain why I did that:

  • I specified the optimizer as 'adam'. Adam (short for Adaptive Moment Estimation) is a popular optimization algorithm for training neural networks. It combines the advantages of two other methods, RMSprop and Momentum, to efficiently update the model's weights during training. It's well-suited for a wide range of deep learning tasks, and I chose it for its effectiveness.

  • For the loss function, I selected 'sparse_categorical_crossentropy'. This loss function is appropriate for multi-class classification tasks like mine, where the target labels are integers (in contrast to one-hot encoded vectors). It calculates the cross-entropy loss between the predicted class probabilities and the actual class labels. Using 'sparse_categorical_crossentropy' simplifies the handling of target labels in my dataset.

  • I added 'accuracy' as a metric to monitor during training. This metric will provide information on how well my model is performing by calculating the classification accuracy. It tells me the percentage of correctly classified examples in the training data.

Why I did it:

  • Compiling the model is a crucial step before training because it sets the configuration for how the network will learn from the data.

  • I selected 'adam' as the optimizer because it is known for its fast convergence and good performance in a wide range of scenarios. It simplifies the process of finding the optimal model weights.

  • 'sparse_categorical_crossentropy' is the appropriate loss function for my multi-class classification task because it handles integer target labels efficiently. It quantifies the dissimilarity between the predicted probabilities and the true labels.

  • By adding 'accuracy' as a metric, I can monitor the model's training progress and assess its performance based on classification accuracy. This helps me evaluate how well the model is learning to make accurate predictions.

The First Batch of Dataset Training¶

In [13]:
r = model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs=10)
Epoch 1/10
1563/1563 [==============================] - 351s 221ms/step - loss: 1.2825 - accuracy: 0.5567 - val_loss: 1.0883 - val_accuracy: 0.6202
Epoch 2/10
1563/1563 [==============================] - 332s 213ms/step - loss: 0.8343 - accuracy: 0.7100 - val_loss: 0.7954 - val_accuracy: 0.7247
Epoch 3/10
1563/1563 [==============================] - 330s 211ms/step - loss: 0.6806 - accuracy: 0.7661 - val_loss: 0.7119 - val_accuracy: 0.7559
Epoch 4/10
1563/1563 [==============================] - 321s 205ms/step - loss: 0.5757 - accuracy: 0.8032 - val_loss: 0.6538 - val_accuracy: 0.7786
Epoch 5/10
1563/1563 [==============================] - 995s 637ms/step - loss: 0.4909 - accuracy: 0.8315 - val_loss: 0.6698 - val_accuracy: 0.7806
Epoch 6/10
1563/1563 [==============================] - 322s 206ms/step - loss: 0.4167 - accuracy: 0.8586 - val_loss: 0.5669 - val_accuracy: 0.8128
Epoch 7/10
1563/1563 [==============================] - 332s 213ms/step - loss: 0.3489 - accuracy: 0.8787 - val_loss: 0.6242 - val_accuracy: 0.8107
Epoch 8/10
1563/1563 [==============================] - 332s 212ms/step - loss: 0.3061 - accuracy: 0.8934 - val_loss: 0.6445 - val_accuracy: 0.8089
Epoch 9/10
1563/1563 [==============================] - 322s 206ms/step - loss: 0.2526 - accuracy: 0.9123 - val_loss: 0.6428 - val_accuracy: 0.8150
Epoch 10/10
1563/1563 [==============================] - 967s 619ms/step - loss: 0.2198 - accuracy: 0.9255 - val_loss: 0.6621 - val_accuracy: 0.8062

I trained the model using the model.fit function with the following specifications:

  • X_train and y_train were used as the training dataset.
  • X_test and y_test were used as the validation dataset.
  • The training was performed for 10 epochs.

Here are the results for each epoch:

  • Epoch 1: At the start, I observed a relatively low training accuracy of approximately 55.67% and a validation accuracy of about 62.02%. The loss on training data was relatively high at 1.2825.
  • Epoch 2: The accuracy improved with each epoch. In the second epoch, the training accuracy increased to 71% and the validation accuracy to 72.47%. The loss decreased significantly.
  • Epoch 3: The training and validation accuracies continued to improve, reaching around 76.61% and 75.59%, respectively.
  • Epoch 4: The model further improved, with training accuracy reaching 80.32% and validation accuracy at 77.86%.
  • Epoch 5: Both training and validation accuracies improved slightly, with training accuracy reaching 83.15% and validation accuracy at 78.06%.
  • Epoch 6: Training accuracy reached 85.86%, and validation accuracy was around 81.28%.
  • Epoch 7: Both training and validation accuracies remained high, with training accuracy at 87.87% and validation accuracy at 81.07%.
  • Epoch 8: Training accuracy continued to increase, reaching 89.34%, while validation accuracy was around 80.89%.
  • Epoch 9: Training accuracy kept improving, reaching 91.23%, and validation accuracy reached 81.50%.
  • Epoch 10: By the final epoch, training accuracy was 92.55%, and validation accuracy was around 80.62%.

The primary goal of this training was to develop a deep learning model for image classification that could accurately classify images from the CIFAR-10 dataset into one of its ten classes. The increase in training accuracy over the epochs indicates that the model was learning and adapting to the data. The validation accuracy was also monitored to ensure that the model generalizes well to unseen data.

The training process involved optimizing the model's weights and biases using the Adam optimizer and minimizing the sparse categorical cross-entropy loss. The use of dropout layers aimed to prevent overfitting.

Overall, the results from this first batch of training indicated promising progress towards developing an effective image classification model, and further improvements were achieved in the subsequent batch.

The Second Batch of Dataset Training After Data Augmentation¶

In [14]:
batch_size = 32
data_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    width_shift_range=0.1, height_shift_range=0.1, horizontal_flip = True)
train_generator = data_generator.flow(X_train, y_train, batch_size)
steps_per_epoch = X_train.shape[0] // batch_size
r = model.fit(train_generator, validation_data = (X_test, y_test),
              steps_per_epoch = steps_per_epoch, epochs = 10)
Epoch 1/10
1562/1562 [==============================] - 362s 231ms/step - loss: 0.6168 - accuracy: 0.7963 - val_loss: 0.5594 - val_accuracy: 0.8184
Epoch 2/10
1562/1562 [==============================] - 336s 215ms/step - loss: 0.5332 - accuracy: 0.8212 - val_loss: 0.5079 - val_accuracy: 0.8292
Epoch 3/10
1562/1562 [==============================] - 3079s 2s/step - loss: 0.4966 - accuracy: 0.8328 - val_loss: 0.5072 - val_accuracy: 0.8302
Epoch 4/10
1562/1562 [==============================] - 377s 242ms/step - loss: 0.4743 - accuracy: 0.8392 - val_loss: 0.5033 - val_accuracy: 0.8332
Epoch 5/10
1562/1562 [==============================] - 352s 225ms/step - loss: 0.4468 - accuracy: 0.8475 - val_loss: 0.5482 - val_accuracy: 0.8228
Epoch 6/10
1562/1562 [==============================] - 368s 236ms/step - loss: 0.4206 - accuracy: 0.8560 - val_loss: 0.4538 - val_accuracy: 0.8425
Epoch 7/10
1562/1562 [==============================] - 364s 233ms/step - loss: 0.4011 - accuracy: 0.8619 - val_loss: 0.4674 - val_accuracy: 0.8430
Epoch 8/10
1562/1562 [==============================] - 354s 227ms/step - loss: 0.3871 - accuracy: 0.8677 - val_loss: 0.4689 - val_accuracy: 0.8368
Epoch 9/10
1562/1562 [==============================] - 357s 229ms/step - loss: 0.3751 - accuracy: 0.8710 - val_loss: 0.5223 - val_accuracy: 0.8289
Epoch 10/10
1562/1562 [==============================] - 358s 229ms/step - loss: 0.3576 - accuracy: 0.8770 - val_loss: 0.4771 - val_accuracy: 0.8399

In the second batch of training, I employed data augmentation techniques to further improve the performance of my image classification model. Data augmentation is a crucial step to increase the model's ability to generalize and perform better on unseen data.

I used the following data augmentation settings:

  • width_shift_range and height_shift_range set to 0.1: This allowed for random horizontal and vertical shifts in the training images, simulating variations in object position within the images.
  • horizontal_flip set to True: This introduced horizontal flipping of the images, which helps the model become more robust to variations in object orientation.

Here's a summary of the key steps and results for the second batch of training:

  • batch_size was set to 32 for this batch.
  • A data_generator was created using tf.keras.preprocessing.image.ImageDataGenerator with the specified augmentation settings.
  • A train_generator was created using the data generator to produce augmented training data.
  • steps_per_epoch was calculated based on the size of the training dataset and the batch size.

Results for each epoch are as follows:

  • Epoch 1: The training accuracy improved significantly, reaching approximately 79.63%, and the validation accuracy also increased to around 81.84%. The model's loss decreased.
  • Epoch 2: Training accuracy continued to rise, reaching around 82.12%, and the validation accuracy improved to 82.92%. The loss decreased further.
  • Epoch 3: The model's accuracy remained high, with training accuracy at about 83.28% and validation accuracy at 83.02%. Loss was stable.
  • Epoch 4: The model continued to perform well, with training accuracy reaching approximately 83.92% and validation accuracy at 83.32%. The loss was consistently low.
  • Epoch 5: Training accuracy improved to around 84.75%, and validation accuracy reached 82.28%.
  • Epoch 6: Training accuracy improved further to 85.60%, and the validation accuracy increased to 84.25%. A notable drop in loss occurred.
  • Epoch 7: Training accuracy improved to around 86.19%, and validation accuracy increased to 84.30%. The model remained stable with minimal loss.
  • Epoch 8: Training accuracy increased to 86.77%, and validation accuracy remained high at 83.68%.
  • Epoch 9: Training accuracy improved slightly to 87.10%, and the validation accuracy reached around 82.89%.
  • Epoch 10: Training accuracy continued to increase, reaching approximately 87.70%, and the validation accuracy was around 83.99%.

I conducted this second batch of training with data augmentation to enhance the model's ability to recognize patterns in the images despite variations in object position and orientation. The results indicate that data augmentation had a positive impact on model performance. Overall, this training step contributed to the model's improved ability to classify images from the CIFAR-10 dataset with higher accuracy and robustness.

Evaluating the Machine Learning Model¶

Plot Accuracy Per Iteration¶

In [16]:
plt.plot(r.history['accuracy'], label = 'acc', color = 'blue')
plt.plot(r.history['val_accuracy'], label= 'val_acc', color = 'green')
plt.legend();

Analyzing the Plot¶

After conducting the second batch of dataset training with data augmentation, it was essential to evaluate the performance of the machine learning model. One of the most effective ways to understand how well the model was learning from the data was to visualize the training and validation accuracy over the epochs.

To achieve this, I created a line plot that displays the training accuracy (labeled 'acc' and shown in blue) and the validation accuracy (labeled 'val_acc' and shown in green) over the course of training. This plot allowed me to gain insights into how well the model was performing during each epoch of the second batch of training.

The results of the line plot provide valuable information:

  • I observed the trend in training accuracy, showing how well the model learned from the training data. A rising training accuracy indicated that the model was improving its ability to correctly classify the training images.
  • Simultaneously, the validation accuracy gave me insights into the model's performance on data it had not seen during training. A rising validation accuracy signified that the model was becoming more capable of generalizing to new, unseen images.

Analyzing the plot:

  • If the training accuracy kept increasing while the validation accuracy plateaued or started to decrease, it could indicate overfitting, a situation where the model is too tailored to the training data and doesn't generalize well.
  • On the other hand, if both training and validation accuracies increased steadily, it would suggest that the model was learning effectively and generalizing well.

In this specific plot, I observed that the training accuracy improved consistently over the epochs, and the validation accuracy followed a similar trend. Both lines showed an upward trajectory, indicating that the model was learning effectively from the augmented data and was also generalizing well to unseen data.

This line plot served as a visual indicator of the model's performance during training, helping me ensure that the machine learning model was on the right track, capable of both learning and generalizing effectively from the dataset. It played a crucial role in assessing the success of the second batch of training with data augmentation and its positive impact on model performance.

Classification Report¶

In [23]:
from sklearn.metrics import confusion_matrix , classification_report
classes = ["Airplane","Automobile","Bird","Cat","Deer","Dog","Frog","Horse","Ship","Truck"]
y_pred = model.predict(X_test)
y_pred_classes = [np.argmax(element) for element in y_pred]

print("Classification Report: \n", classification_report(y_test, y_pred_classes, target_names=classes))
313/313 [==============================] - 11s 34ms/step
Classification Report: 
               precision    recall  f1-score   support

    Airplane       0.75      0.94      0.84      1000
  Automobile       0.87      0.96      0.92      1000
        Bird       0.81      0.78      0.80      1000
         Cat       0.75      0.69      0.72      1000
        Deer       0.83      0.82      0.83      1000
         Dog       0.80      0.77      0.78      1000
        Frog       0.79      0.94      0.86      1000
       Horse       0.94      0.84      0.89      1000
        Ship       0.97      0.78      0.87      1000
       Truck       0.94      0.88      0.91      1000

    accuracy                           0.84     10000
   macro avg       0.85      0.84      0.84     10000
weighted avg       0.85      0.84      0.84     10000

Analyzing the Classification Report¶

As part of evaluating the performance of my image classification model, I generated a classification report using the scikit-learn library. This classification report provided a comprehensive overview of how well the model was performing across different classes and various evaluation metrics. Here's a breakdown of the report and why I found it valuable:

  • Precision: I examined the precision scores for each class. Precision measures the accuracy of positive predictions. In my case, it revealed how many of the images predicted as a particular class were correct. Higher precision indicated fewer false positives.

  • Recall: I also looked at the recall scores. Recall quantifies the model's ability to correctly identify all relevant instances within a class. It told me how many of the actual instances of a class were correctly predicted by the model.

  • F1-score: The F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall, which is particularly useful when the dataset has class imbalances.

  • Support: Support represents the number of actual occurrences of each class in the dataset.

The classification report included values for each of the ten classes (in this case, airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck) and two macro-level metrics:

  • Macro Average: I observed the macro-average precision, recall, and F1-score. It calculates the average of these metrics for each class, giving equal weight to each class. This was helpful to understand the model's overall performance across all classes.

  • Weighted Average: The weighted average provided a similar evaluation to the macro-average but considered the class distribution. It was especially useful when the dataset had class imbalances. It weighted the metrics by the number of samples in each class.

Accuracy: The classification report also reported the overall accuracy of the model. Accuracy measured how many of the total predictions were correct. It gave an indication of the model's overall performance on the test set.

In this specific report, I found that the model performed reasonably well. The precision, recall, and F1-scores for most classes were relatively high, indicating that the model was effective at classifying images into their respective categories. The accuracy of 84% suggested that the model was making correct predictions for a large portion of the test dataset.

The classification report provided a comprehensive understanding of the model's strengths and areas that might need improvement. It allowed me to identify specific classes where the model excelled and others where it might require further fine-tuning. Overall, this report was a crucial tool in assessing the model's performance and guiding any necessary adjustments to enhance its accuracy and precision further.

Confusion Matrix¶

In [24]:
import seaborn as sns
import matplotlib.pyplot as plt

# Create the confusion matrix
confusion_matrix = tf.math.confusion_matrix(y_test, y_pred_classes)

# Set the class labels for the heatmap
class_labels = [
    "Airplane",
    "Automobile",
    "Bird",
    "Cat",
    "Deer",
    "Dog",
    "Frog",
    "Horse",
    "Ship",
    "Truck"
]

# Plot the confusion matrix using a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(confusion_matrix, annot=True, cmap='Blues',
            xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

Analyzing the Confusion Matrix Heatmap¶

I created a heatmap plot of the confusion matrix as part of my model evaluation. The confusion matrix is a crucial tool for understanding how well my image classification model performed across different classes. In this case, I generated a heatmap using the seaborn library and matplotlib to visualize the confusion matrix results. Here's why I did this and what the heatmap revealed:

  • Confusion Matrix: The confusion matrix is a grid that compares the model's predicted labels to the actual labels in the test dataset. It's particularly useful for multi-class classification tasks, like the one I was working on. Each cell of the matrix represents a combination of true positive, true negative, false positive, and false negative predictions for a specific class.

  • Class Labels: To make the heatmap more interpretable, I labeled the rows and columns of the heatmap with class names. This way, I could easily identify which class the model was confusing with another.

  • Heatmap Visualization: A heatmap is a graphical representation of the confusion matrix, where each cell's color intensity represents the number of samples that fall into a particular category. I used a blue color map ('Blues') to visualize the results, with darker shades indicating higher values.

By plotting the confusion matrix as a heatmap, I gained the following insights:

  • Diagonal Elements: The diagonal from the top-left to the bottom-right of the heatmap represented the true positive predictions. In other words, it showed how many samples from each class were correctly classified.

  • Off-diagonal Elements: The off-diagonal cells showed misclassifications. I could see which classes were frequently confused with each other. Darker cells indicated more significant confusion between those classes.

  • Class-Specific Performance: I could assess the model's performance for each class individually. For classes with bright diagonal cells and low off-diagonal values, the model performed well. However, for classes with dark off-diagonal cells, the model struggled to distinguish between those classes.

The heatmap visualization of the confusion matrix offered a clear, visual representation of the model's performance. It helped me identify specific areas where the model was excelling and areas where it needed improvement. This information was invaluable for fine-tuning the model and understanding its behavior in a multi-class classification context. It also provided a more intuitive way to grasp the overall model performance, especially when dealing with a large number of classes.

In summary, the confusion matrix heatmap was a vital component of model evaluation, giving me a visual overview of class-specific performance and highlighting potential areas for optimization.