from torchvision import datasets
from torchvision.transforms import ToTensor
# Ignore deprecated warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
train_data = datasets.MNIST(
root = 'data',
train = True,
transform = ToTensor(),
download = True
)
test_data = datasets.MNIST(
root = 'data',
train = False,
transform = ToTensor(),
download = True
)
train_data
Dataset MNIST Number of datapoints: 60000 Root location: data Split: Train StandardTransform Transform: ToTensor()
test_data
Dataset MNIST Number of datapoints: 10000 Root location: data Split: Test StandardTransform Transform: ToTensor()
train_data.data.shape
torch.Size([60000, 28, 28])
test_data.data.shape
torch.Size([10000, 28, 28])
train_data.targets.shape
torch.Size([60000])
train_data.targets
tensor([5, 0, 4, ..., 5, 6, 8])
The MNIST (Modified National Institute of Standards and Technology) dataset is a widely-used dataset in the field of machine learning and computer vision. It consists of a collection of handwritten digits, from 0 to 9, and serves as a benchmark for developing and testing machine learning algorithms. The dataset was created from samples of handwritten digits by high school students and employees of the United States Census Bureau.
from torch.utils.data import DataLoader
loaders = {
'train' : DataLoader(train_data,
batch_size=100,
shuffle=True,
num_workers=1),
'test' : DataLoader(test_data,
batch_size=100,
shuffle=True,
num_workers=1)
}
loaders
{'train': <torch.utils.data.dataloader.DataLoader at 0x22fb2b0fa10>, 'test': <torch.utils.data.dataloader.DataLoader at 0x22fb2b05fd0>}
I'm here to provide an explanation as requested in the first-person point of view:
I created the loaders
dictionary in my code to manage the data loading process for my machine learning project. Here's why I did this:
Data Loading Module: First, I imported the DataLoader
class from PyTorch's torch.utils.data
module. This class is essential for efficiently loading and managing my training and testing datasets.
Data Split: In my project, I have two sets of data: a training dataset (train_data
) and a testing dataset (test_data
). I need to split these datasets into batches for processing during training and testing.
Loaders for Training and Testing: I created two data loaders within the loaders
dictionary: one for training data and one for testing data. Here's what each loader does:
train
DataLoader:
train_data
).shuffle
to True
, which means that the training data will be shuffled at the beginning of each epoch. Shuffling helps prevent the model from memorizing the order of the data and aids in better generalization.num_workers
with a value of 1. This means that I will use a single worker process for data loading. This is usually sufficient for most tasks. More workers can be used to load data faster if needed, but it depends on the available system resources.test
DataLoader:
test_data
).shuffle
to True
to randomize the order of test data.In summary, I created these data loaders to efficiently process and manage my training and testing data in batches, making it easier to train and evaluate machine learning models. The choice of batch size, shuffling, and the number of workers depends on the specific requirements of my project.
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.softmax(x)
As I embark on the journey to analyze the MNIST dataset, I've chosen to work with PyTorch and employ a Convolutional Neural Network (CNN) for a few compelling reasons.
Ease of Use and Flexibility: PyTorch is known for its user-friendly and dynamic computation graph, making it an excellent choice for deep learning research and experimentation. As I explore the MNIST dataset, PyTorch's flexibility allows me to define and modify complex neural network architectures with ease.
Rich Ecosystem: PyTorch offers a rich ecosystem of tools and libraries for machine learning and deep learning, including torchvision
, which provides easy access to popular datasets like MNIST, as well as pre-processing tools and data augmentation techniques.
Community and Documentation: PyTorch has a vibrant and supportive community. Extensive documentation, tutorials, and online forums provide valuable resources as I delve into the intricacies of deep learning and image classification.
GPU Acceleration: PyTorch seamlessly integrates with GPUs, enabling me to leverage the power of accelerated computing for faster training and better model performance.
Specialized for Image Data: CNNs are designed for image-related tasks, and the MNIST dataset consists of 28x28 pixel grayscale images of handwritten digits. CNNs excel at capturing hierarchical patterns and features in images, making them a natural choice for this task.
Feature Learning: CNNs automatically learn and extract hierarchical features from the input data. This feature learning capability allows me to focus on the architecture and let the network discover the relevant patterns within the images.
Spatial Hierarchy: CNNs preserve the spatial hierarchy of the input data, which is essential for recognizing intricate patterns and structures in images. In the case of handwritten digits, preserving the spatial relationships of pixels is crucial for accurate recognition.
Weight Sharing: CNNs utilize weight sharing to reduce the number of parameters, making them more efficient for image data. This is particularly beneficial when working with relatively smalnetwork return F.softmax(x)
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()
The code performs the following tasks:
It imports the PyTorch library using import torch
.
It checks if a CUDA-compatible GPU is available using torch.cuda.is_available()
. If a GPU is available, the code assigns the device to the GPU ('cuda'
); otherwise, it assigns the device to the CPU ('cpu'
). This allows the model and data to be processed on the available hardware.
It creates an instance of the CNN model using model = CNN()
. This model is designed for image classification tasks and is intended to recognize handwritten digits. The model is then moved to the specified device (GPU or CPU) using .to(device)
.
It sets up the optimizer for training the model. In this case, it uses the Adam optimizer (optim.Adam
) and specifies a learning rate of 0.001. The optimizer is responsible for updating the model's parameters during training.
It defines the loss function for the model. The code uses the cross-entropy loss (nn.CrossEntropyLoss()
), which is a common choice for classification tasks. This loss function quantifies the difference between the predicted class probabilities and the actual target labels.
These steps are crucial for setting up the model, training it, and optimizing its parameters during the training process. The choice of GPU or CPU depends on the availability of hardware and the speed of processing required.
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
if batch_idx % 20 == 0:
print(f"Train Epoch: {epoch} [{batch_idx * len(data)} / {len(loaders['train'].dataset)} ({100 * batch_idx / len(loaders['train']):0f}%)]\t{loss.item():.6f}")
I implemented the train
function in my code, and I'll explain why I did this:
Model Training Purpose: I used the train
function to handle the training process for my machine learning model. Here's why I implemented this function:
Set Model State: I started by setting the model's state to training mode using model.train()
. This is necessary because some layers, like dropout and batch normalization, behave differently during training and testing. Setting the model to training mode ensures that these layers operate as expected during training.
Data Iteration: I used a for
loop to iterate through the training data. The loop, defined by for batch_idx, (data, target) in enumerate(loaders['train']):
, allows me to process the data in batches.
Data Transfer to GPU: To leverage the GPU for faster computation (if available), I transferred both the input data and target labels to the device (either 'cuda' if GPU is available or 'cpu') using data, target = data.to(device), target.to(device)
.
Gradient Initialization: Before calculating gradients, I initialized the optimizer's gradients to zero with optimizer.zero_grad()
. This step is crucial because it ensures that gradients from previous iterations do not accumulate.
Forward Pass: I then performed a forward pass through the model by calling model(data)
. This computes the model's predictions based on the input data.
Loss Calculation: After obtaining model predictions, I calculated the loss between these predictions and the actual target labels using the specified loss function (loss_fn
). This loss quantifies how well the model is performing on the given batch of data.
Backpropagation: To train the model, I performed backpropagation by calling loss.backward()
. This step computes the gradients of the loss with respect to the model's parameters.
Parameter Updates: I updated the model's parameters by calling optimizer.step()
. This step is responsible for adjusting the model's weights to minimize the loss.
Progress Reporting: To keep track of training progress, I included a conditional statement that prints training progress updates every 20 batches. These updates include the current epoch, the number of processed samples, the total dataset size, and the current batch's loss.
In summary, I created the train
function to encapsulate the training logic for my model. This function ensures that the model iteratively processes batches of training data, computes gradients, updates model parameters, and reports training progress. By structuring the training process this way, I can easily manage and monitor the training of my machine learning model.
def test():
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in loaders['test']:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += loss_fn(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(loaders['test'].dataset)
print(f"\nTest set: Average loss: {test_loss: 0.4f}, Accuracy {correct}/{len(loaders['test'].dataset)} ({100 * correct / len(loaders['test'].dataset):.0f}%\n)")
I implemented the test
function in my code, and I'll explain why I did this using the first-person point of view:
Model Evaluation Purpose: I created the test
function to handle the evaluation of my machine learning model. Here's why I implemented this function:
Set Model State: I started by setting the model's state to evaluation mode using model.eval()
. It's essential to do this because during evaluation, I don't want the model to perform operations like dropout, which are active during training. Setting the model to evaluation mode ensures that these operations are deactivated.
Initialize Evaluation Variables: I initialized variables test_loss
to keep track of the cumulative test loss and correct
to count the number of correct predictions made by the model.
No Gradient Computation: Inside the with torch.no_grad():
context, I specified that no gradients should be computed. This is because I don't need to backpropagate during evaluation, and disabling gradient computation improves efficiency.
Data Iteration: I used a for
loop to iterate through the test data using for data, target in loaders['test']:
. This loop allows me to process the test data in batches.
Data Transfer to GPU: Similar to the training phase, I transferred the test data and target labels to the specified device (either 'cuda' if GPU is available or 'cpu') with data, target = data.to(device), target.to(device)
.
Forward Pass and Loss Computation: I passed the test data through the model with output = model(data)
to obtain predictions. I then computed the test loss using the specified loss function (loss_fn
) and added it to the test_loss
variable. This step helps assess the model's performance on the test data.
Prediction and Accuracy Calculation: I determined the model's predictions by finding the class with the highest probability using pred = output.argmax(dim=1, keepdim=True)
. I compared these predictions to the actual target labels and calculated the number of correct predictions using correct += pred.eq(target.view_as(pred)).sum().item()
.
Final Evaluation Metrics: After processing all the test data, I divided the cumulative test loss by the total number of samples in the test dataset to calculate the average test loss. I also computed the accuracy by dividing the number of correct predictions by the total number of samples in the test dataset.
Print Evaluation Results: I printed the evaluation results, including the average test loss and accuracy, to provide a summary of how well the model performed on the test data.
In summary, I created the test
function to evaluate the model's performance on the test dataset. By structuring the evaluation process in this way, I can efficiently assess the model's accuracy and loss without unnecessary gradient computation, making the process both clear and computationally efficient.
for epoch in range(1, 11):
train(epoch)
test()
Train Epoch: 1 [0 / 60000 (0.000000%)] 2.301496 Train Epoch: 1 [2000 / 60000 (3.333333%)] 2.299796 Train Epoch: 1 [4000 / 60000 (6.666667%)] 2.234667 Train Epoch: 1 [6000 / 60000 (10.000000%)] 2.049563 Train Epoch: 1 [8000 / 60000 (13.333333%)] 1.955850 Train Epoch: 1 [10000 / 60000 (16.666667%)] 1.826042 Train Epoch: 1 [12000 / 60000 (20.000000%)] 1.799554 Train Epoch: 1 [14000 / 60000 (23.333333%)] 1.852303 Train Epoch: 1 [16000 / 60000 (26.666667%)] 1.749964 Train Epoch: 1 [18000 / 60000 (30.000000%)] 1.714636 Train Epoch: 1 [20000 / 60000 (33.333333%)] 1.688770 Train Epoch: 1 [22000 / 60000 (36.666667%)] 1.756604 Train Epoch: 1 [24000 / 60000 (40.000000%)] 1.658743 Train Epoch: 1 [26000 / 60000 (43.333333%)] 1.637315 Train Epoch: 1 [28000 / 60000 (46.666667%)] 1.714050 Train Epoch: 1 [30000 / 60000 (50.000000%)] 1.609103 Train Epoch: 1 [32000 / 60000 (53.333333%)] 1.637861 Train Epoch: 1 [34000 / 60000 (56.666667%)] 1.653939 Train Epoch: 1 [36000 / 60000 (60.000000%)] 1.633924 Train Epoch: 1 [38000 / 60000 (63.333333%)] 1.595253 Train Epoch: 1 [40000 / 60000 (66.666667%)] 1.701176 Train Epoch: 1 [42000 / 60000 (70.000000%)] 1.584992 Train Epoch: 1 [44000 / 60000 (73.333333%)] 1.595182 Train Epoch: 1 [46000 / 60000 (76.666667%)] 1.628308 Train Epoch: 1 [48000 / 60000 (80.000000%)] 1.643824 Train Epoch: 1 [50000 / 60000 (83.333333%)] 1.655304 Train Epoch: 1 [52000 / 60000 (86.666667%)] 1.607933 Train Epoch: 1 [54000 / 60000 (90.000000%)] 1.643100 Train Epoch: 1 [56000 / 60000 (93.333333%)] 1.587383 Train Epoch: 1 [58000 / 60000 (96.666667%)] 1.616170 Test set: Average loss: 0.0153, Accuracy 9334/10000 (93% ) Train Epoch: 2 [0 / 60000 (0.000000%)] 1.600160 Train Epoch: 2 [2000 / 60000 (3.333333%)] 1.627587 Train Epoch: 2 [4000 / 60000 (6.666667%)] 1.572723 Train Epoch: 2 [6000 / 60000 (10.000000%)] 1.562143 Train Epoch: 2 [8000 / 60000 (13.333333%)] 1.610938 Train Epoch: 2 [10000 / 60000 (16.666667%)] 1.601952 Train Epoch: 2 [12000 / 60000 (20.000000%)] 1.603774 Train Epoch: 2 [14000 / 60000 (23.333333%)] 1.652240 Train Epoch: 2 [16000 / 60000 (26.666667%)] 1.571008 Train Epoch: 2 [18000 / 60000 (30.000000%)] 1.560259 Train Epoch: 2 [20000 / 60000 (33.333333%)] 1.552301 Train Epoch: 2 [22000 / 60000 (36.666667%)] 1.582480 Train Epoch: 2 [24000 / 60000 (40.000000%)] 1.635565 Train Epoch: 2 [26000 / 60000 (43.333333%)] 1.601743 Train Epoch: 2 [28000 / 60000 (46.666667%)] 1.575946 Train Epoch: 2 [30000 / 60000 (50.000000%)] 1.546602 Train Epoch: 2 [32000 / 60000 (53.333333%)] 1.563537 Train Epoch: 2 [34000 / 60000 (56.666667%)] 1.572285 Train Epoch: 2 [36000 / 60000 (60.000000%)] 1.592458 Train Epoch: 2 [38000 / 60000 (63.333333%)] 1.622201 Train Epoch: 2 [40000 / 60000 (66.666667%)] 1.564027 Train Epoch: 2 [42000 / 60000 (70.000000%)] 1.538715 Train Epoch: 2 [44000 / 60000 (73.333333%)] 1.567681 Train Epoch: 2 [46000 / 60000 (76.666667%)] 1.617284 Train Epoch: 2 [48000 / 60000 (80.000000%)] 1.569258 Train Epoch: 2 [50000 / 60000 (83.333333%)] 1.538384 Train Epoch: 2 [52000 / 60000 (86.666667%)] 1.569352 Train Epoch: 2 [54000 / 60000 (90.000000%)] 1.544435 Train Epoch: 2 [56000 / 60000 (93.333333%)] 1.532953 Train Epoch: 2 [58000 / 60000 (96.666667%)] 1.538525 Test set: Average loss: 0.0151, Accuracy 9553/10000 (96% ) Train Epoch: 3 [0 / 60000 (0.000000%)] 1.545886 Train Epoch: 3 [2000 / 60000 (3.333333%)] 1.572533 Train Epoch: 3 [4000 / 60000 (6.666667%)] 1.592975 Train Epoch: 3 [6000 / 60000 (10.000000%)] 1.569193 Train Epoch: 3 [8000 / 60000 (13.333333%)] 1.568521 Train Epoch: 3 [10000 / 60000 (16.666667%)] 1.544026 Train Epoch: 3 [12000 / 60000 (20.000000%)] 1.547681 Train Epoch: 3 [14000 / 60000 (23.333333%)] 1.594746 Train Epoch: 3 [16000 / 60000 (26.666667%)] 1.540698 Train Epoch: 3 [18000 / 60000 (30.000000%)] 1.555120 Train Epoch: 3 [20000 / 60000 (33.333333%)] 1.539120 Train Epoch: 3 [22000 / 60000 (36.666667%)] 1.504599 Train Epoch: 3 [24000 / 60000 (40.000000%)] 1.555029 Train Epoch: 3 [26000 / 60000 (43.333333%)] 1.545280 Train Epoch: 3 [28000 / 60000 (46.666667%)] 1.542372 Train Epoch: 3 [30000 / 60000 (50.000000%)] 1.579749 Train Epoch: 3 [32000 / 60000 (53.333333%)] 1.531719 Train Epoch: 3 [34000 / 60000 (56.666667%)] 1.550751 Train Epoch: 3 [36000 / 60000 (60.000000%)] 1.553100 Train Epoch: 3 [38000 / 60000 (63.333333%)] 1.569892 Train Epoch: 3 [40000 / 60000 (66.666667%)] 1.594966 Train Epoch: 3 [42000 / 60000 (70.000000%)] 1.566180 Train Epoch: 3 [44000 / 60000 (73.333333%)] 1.587754 Train Epoch: 3 [46000 / 60000 (76.666667%)] 1.487496 Train Epoch: 3 [48000 / 60000 (80.000000%)] 1.550445 Train Epoch: 3 [50000 / 60000 (83.333333%)] 1.524189 Train Epoch: 3 [52000 / 60000 (86.666667%)] 1.541144 Train Epoch: 3 [54000 / 60000 (90.000000%)] 1.551914 Train Epoch: 3 [56000 / 60000 (93.333333%)] 1.592769 Train Epoch: 3 [58000 / 60000 (96.666667%)] 1.534003 Test set: Average loss: 0.0150, Accuracy 9624/10000 (96% ) Train Epoch: 4 [0 / 60000 (0.000000%)] 1.553204 Train Epoch: 4 [2000 / 60000 (3.333333%)] 1.551762 Train Epoch: 4 [4000 / 60000 (6.666667%)] 1.530254 Train Epoch: 4 [6000 / 60000 (10.000000%)] 1.563214 Train Epoch: 4 [8000 / 60000 (13.333333%)] 1.559431 Train Epoch: 4 [10000 / 60000 (16.666667%)] 1.594450 Train Epoch: 4 [12000 / 60000 (20.000000%)] 1.537316 Train Epoch: 4 [14000 / 60000 (23.333333%)] 1.556357 Train Epoch: 4 [16000 / 60000 (26.666667%)] 1.531391 Train Epoch: 4 [18000 / 60000 (30.000000%)] 1.548171 Train Epoch: 4 [20000 / 60000 (33.333333%)] 1.557116 Train Epoch: 4 [22000 / 60000 (36.666667%)] 1.537051 Train Epoch: 4 [24000 / 60000 (40.000000%)] 1.542068 Train Epoch: 4 [26000 / 60000 (43.333333%)] 1.528775 Train Epoch: 4 [28000 / 60000 (46.666667%)] 1.541602 Train Epoch: 4 [30000 / 60000 (50.000000%)] 1.517259 Train Epoch: 4 [32000 / 60000 (53.333333%)] 1.531498 Train Epoch: 4 [34000 / 60000 (56.666667%)] 1.547909 Train Epoch: 4 [36000 / 60000 (60.000000%)] 1.561429 Train Epoch: 4 [38000 / 60000 (63.333333%)] 1.558710 Train Epoch: 4 [40000 / 60000 (66.666667%)] 1.535528 Train Epoch: 4 [42000 / 60000 (70.000000%)] 1.568676 Train Epoch: 4 [44000 / 60000 (73.333333%)] 1.527642 Train Epoch: 4 [46000 / 60000 (76.666667%)] 1.574218 Train Epoch: 4 [48000 / 60000 (80.000000%)] 1.596313 Train Epoch: 4 [50000 / 60000 (83.333333%)] 1.558974 Train Epoch: 4 [52000 / 60000 (86.666667%)] 1.518111 Train Epoch: 4 [54000 / 60000 (90.000000%)] 1.536786 Train Epoch: 4 [56000 / 60000 (93.333333%)] 1.533889 Train Epoch: 4 [58000 / 60000 (96.666667%)] 1.555584 Test set: Average loss: 0.0149, Accuracy 9675/10000 (97% ) Train Epoch: 5 [0 / 60000 (0.000000%)] 1.522395 Train Epoch: 5 [2000 / 60000 (3.333333%)] 1.541693 Train Epoch: 5 [4000 / 60000 (6.666667%)] 1.510962 Train Epoch: 5 [6000 / 60000 (10.000000%)] 1.533496 Train Epoch: 5 [8000 / 60000 (13.333333%)] 1.574352 Train Epoch: 5 [10000 / 60000 (16.666667%)] 1.594084 Train Epoch: 5 [12000 / 60000 (20.000000%)] 1.527336 Train Epoch: 5 [14000 / 60000 (23.333333%)] 1.538486 Train Epoch: 5 [16000 / 60000 (26.666667%)] 1.594529 Train Epoch: 5 [18000 / 60000 (30.000000%)] 1.556067 Train Epoch: 5 [20000 / 60000 (33.333333%)] 1.548285 Train Epoch: 5 [22000 / 60000 (36.666667%)] 1.554191 Train Epoch: 5 [24000 / 60000 (40.000000%)] 1.540390 Train Epoch: 5 [26000 / 60000 (43.333333%)] 1.576330 Train Epoch: 5 [28000 / 60000 (46.666667%)] 1.547895 Train Epoch: 5 [30000 / 60000 (50.000000%)] 1.514035 Train Epoch: 5 [32000 / 60000 (53.333333%)] 1.515391 Train Epoch: 5 [34000 / 60000 (56.666667%)] 1.585909 Train Epoch: 5 [36000 / 60000 (60.000000%)] 1.514392 Train Epoch: 5 [38000 / 60000 (63.333333%)] 1.535323 Train Epoch: 5 [40000 / 60000 (66.666667%)] 1.563999 Train Epoch: 5 [42000 / 60000 (70.000000%)] 1.540493 Train Epoch: 5 [44000 / 60000 (73.333333%)] 1.553322 Train Epoch: 5 [46000 / 60000 (76.666667%)] 1.585102 Train Epoch: 5 [48000 / 60000 (80.000000%)] 1.528749 Train Epoch: 5 [50000 / 60000 (83.333333%)] 1.553087 Train Epoch: 5 [52000 / 60000 (86.666667%)] 1.488248 Train Epoch: 5 [54000 / 60000 (90.000000%)] 1.539975 Train Epoch: 5 [56000 / 60000 (93.333333%)] 1.530008 Train Epoch: 5 [58000 / 60000 (96.666667%)] 1.544332 Test set: Average loss: 0.0149, Accuracy 9685/10000 (97% ) Train Epoch: 6 [0 / 60000 (0.000000%)] 1.518986 Train Epoch: 6 [2000 / 60000 (3.333333%)] 1.535996 Train Epoch: 6 [4000 / 60000 (6.666667%)] 1.580249 Train Epoch: 6 [6000 / 60000 (10.000000%)] 1.513739 Train Epoch: 6 [8000 / 60000 (13.333333%)] 1.529359 Train Epoch: 6 [10000 / 60000 (16.666667%)] 1.523777 Train Epoch: 6 [12000 / 60000 (20.000000%)] 1.543048 Train Epoch: 6 [14000 / 60000 (23.333333%)] 1.561280 Train Epoch: 6 [16000 / 60000 (26.666667%)] 1.543090 Train Epoch: 6 [18000 / 60000 (30.000000%)] 1.525940 Train Epoch: 6 [20000 / 60000 (33.333333%)] 1.540713 Train Epoch: 6 [22000 / 60000 (36.666667%)] 1.543035 Train Epoch: 6 [24000 / 60000 (40.000000%)] 1.541523 Train Epoch: 6 [26000 / 60000 (43.333333%)] 1.556609 Train Epoch: 6 [28000 / 60000 (46.666667%)] 1.526387 Train Epoch: 6 [30000 / 60000 (50.000000%)] 1.543519 Train Epoch: 6 [32000 / 60000 (53.333333%)] 1.494880 Train Epoch: 6 [34000 / 60000 (56.666667%)] 1.518081 Train Epoch: 6 [36000 / 60000 (60.000000%)] 1.533522 Train Epoch: 6 [38000 / 60000 (63.333333%)] 1.514373 Train Epoch: 6 [40000 / 60000 (66.666667%)] 1.507630 Train Epoch: 6 [42000 / 60000 (70.000000%)] 1.504171 Train Epoch: 6 [44000 / 60000 (73.333333%)] 1.502375 Train Epoch: 6 [46000 / 60000 (76.666667%)] 1.537565 Train Epoch: 6 [48000 / 60000 (80.000000%)] 1.513845 Train Epoch: 6 [50000 / 60000 (83.333333%)] 1.572216 Train Epoch: 6 [52000 / 60000 (86.666667%)] 1.545131 Train Epoch: 6 [54000 / 60000 (90.000000%)] 1.501174 Train Epoch: 6 [56000 / 60000 (93.333333%)] 1.573419 Train Epoch: 6 [58000 / 60000 (96.666667%)] 1.513988 Test set: Average loss: 0.0149, Accuracy 9730/10000 (97% ) Train Epoch: 7 [0 / 60000 (0.000000%)] 1.517420 Train Epoch: 7 [2000 / 60000 (3.333333%)] 1.535453 Train Epoch: 7 [4000 / 60000 (6.666667%)] 1.530100 Train Epoch: 7 [6000 / 60000 (10.000000%)] 1.532814 Train Epoch: 7 [8000 / 60000 (13.333333%)] 1.521793 Train Epoch: 7 [10000 / 60000 (16.666667%)] 1.547600 Train Epoch: 7 [12000 / 60000 (20.000000%)] 1.518122 Train Epoch: 7 [14000 / 60000 (23.333333%)] 1.549292 Train Epoch: 7 [16000 / 60000 (26.666667%)] 1.511692 Train Epoch: 7 [18000 / 60000 (30.000000%)] 1.532994 Train Epoch: 7 [20000 / 60000 (33.333333%)] 1.531310 Train Epoch: 7 [22000 / 60000 (36.666667%)] 1.513322 Train Epoch: 7 [24000 / 60000 (40.000000%)] 1.562150 Train Epoch: 7 [26000 / 60000 (43.333333%)] 1.508719 Train Epoch: 7 [28000 / 60000 (46.666667%)] 1.530325 Train Epoch: 7 [30000 / 60000 (50.000000%)] 1.510764 Train Epoch: 7 [32000 / 60000 (53.333333%)] 1.544699 Train Epoch: 7 [34000 / 60000 (56.666667%)] 1.523095 Train Epoch: 7 [36000 / 60000 (60.000000%)] 1.534980 Train Epoch: 7 [38000 / 60000 (63.333333%)] 1.534617 Train Epoch: 7 [40000 / 60000 (66.666667%)] 1.486062 Train Epoch: 7 [42000 / 60000 (70.000000%)] 1.586542 Train Epoch: 7 [44000 / 60000 (73.333333%)] 1.565730 Train Epoch: 7 [46000 / 60000 (76.666667%)] 1.554644 Train Epoch: 7 [48000 / 60000 (80.000000%)] 1.534968 Train Epoch: 7 [50000 / 60000 (83.333333%)] 1.505530 Train Epoch: 7 [52000 / 60000 (86.666667%)] 1.532503 Train Epoch: 7 [54000 / 60000 (90.000000%)] 1.531861 Train Epoch: 7 [56000 / 60000 (93.333333%)] 1.523200 Train Epoch: 7 [58000 / 60000 (96.666667%)] 1.498222 Test set: Average loss: 0.0149, Accuracy 9721/10000 (97% ) Train Epoch: 8 [0 / 60000 (0.000000%)] 1.527625 Train Epoch: 8 [2000 / 60000 (3.333333%)] 1.515672 Train Epoch: 8 [4000 / 60000 (6.666667%)] 1.531912 Train Epoch: 8 [6000 / 60000 (10.000000%)] 1.530513 Train Epoch: 8 [8000 / 60000 (13.333333%)] 1.535390 Train Epoch: 8 [10000 / 60000 (16.666667%)] 1.539296 Train Epoch: 8 [12000 / 60000 (20.000000%)] 1.537336 Train Epoch: 8 [14000 / 60000 (23.333333%)] 1.547306 Train Epoch: 8 [16000 / 60000 (26.666667%)] 1.550183 Train Epoch: 8 [18000 / 60000 (30.000000%)] 1.588983 Train Epoch: 8 [20000 / 60000 (33.333333%)] 1.522979 Train Epoch: 8 [22000 / 60000 (36.666667%)] 1.531867 Train Epoch: 8 [24000 / 60000 (40.000000%)] 1.525794 Train Epoch: 8 [26000 / 60000 (43.333333%)] 1.543640 Train Epoch: 8 [28000 / 60000 (46.666667%)] 1.525122 Train Epoch: 8 [30000 / 60000 (50.000000%)] 1.563571 Train Epoch: 8 [32000 / 60000 (53.333333%)] 1.495694 Train Epoch: 8 [34000 / 60000 (56.666667%)] 1.508087 Train Epoch: 8 [36000 / 60000 (60.000000%)] 1.504573 Train Epoch: 8 [38000 / 60000 (63.333333%)] 1.520058 Train Epoch: 8 [40000 / 60000 (66.666667%)] 1.510235 Train Epoch: 8 [42000 / 60000 (70.000000%)] 1.510490 Train Epoch: 8 [44000 / 60000 (73.333333%)] 1.506342 Train Epoch: 8 [46000 / 60000 (76.666667%)] 1.579644 Train Epoch: 8 [48000 / 60000 (80.000000%)] 1.509308 Train Epoch: 8 [50000 / 60000 (83.333333%)] 1.511500 Train Epoch: 8 [52000 / 60000 (86.666667%)] 1.527627 Train Epoch: 8 [54000 / 60000 (90.000000%)] 1.523505 Train Epoch: 8 [56000 / 60000 (93.333333%)] 1.551216 Train Epoch: 8 [58000 / 60000 (96.666667%)] 1.558615 Test set: Average loss: 0.0149, Accuracy 9738/10000 (97% ) Train Epoch: 9 [0 / 60000 (0.000000%)] 1.546198 Train Epoch: 9 [2000 / 60000 (3.333333%)] 1.542535 Train Epoch: 9 [4000 / 60000 (6.666667%)] 1.524706 Train Epoch: 9 [6000 / 60000 (10.000000%)] 1.540964 Train Epoch: 9 [8000 / 60000 (13.333333%)] 1.523273 Train Epoch: 9 [10000 / 60000 (16.666667%)] 1.500313 Train Epoch: 9 [12000 / 60000 (20.000000%)] 1.520607 Train Epoch: 9 [14000 / 60000 (23.333333%)] 1.501847 Train Epoch: 9 [16000 / 60000 (26.666667%)] 1.494763 Train Epoch: 9 [18000 / 60000 (30.000000%)] 1.496070 Train Epoch: 9 [20000 / 60000 (33.333333%)] 1.484413 Train Epoch: 9 [22000 / 60000 (36.666667%)] 1.486379 Train Epoch: 9 [24000 / 60000 (40.000000%)] 1.545462 Train Epoch: 9 [26000 / 60000 (43.333333%)] 1.505750 Train Epoch: 9 [28000 / 60000 (46.666667%)] 1.559706 Train Epoch: 9 [30000 / 60000 (50.000000%)] 1.527358 Train Epoch: 9 [32000 / 60000 (53.333333%)] 1.510613 Train Epoch: 9 [34000 / 60000 (56.666667%)] 1.542167 Train Epoch: 9 [36000 / 60000 (60.000000%)] 1.559581 Train Epoch: 9 [38000 / 60000 (63.333333%)] 1.485414 Train Epoch: 9 [40000 / 60000 (66.666667%)] 1.567349 Train Epoch: 9 [42000 / 60000 (70.000000%)] 1.521175 Train Epoch: 9 [44000 / 60000 (73.333333%)] 1.519678 Train Epoch: 9 [46000 / 60000 (76.666667%)] 1.509743 Train Epoch: 9 [48000 / 60000 (80.000000%)] 1.556910 Train Epoch: 9 [50000 / 60000 (83.333333%)] 1.548729 Train Epoch: 9 [52000 / 60000 (86.666667%)] 1.511049 Train Epoch: 9 [54000 / 60000 (90.000000%)] 1.534945 Train Epoch: 9 [56000 / 60000 (93.333333%)] 1.538503 Train Epoch: 9 [58000 / 60000 (96.666667%)] 1.540078 Test set: Average loss: 0.0149, Accuracy 9767/10000 (98% ) Train Epoch: 10 [0 / 60000 (0.000000%)] 1.492197 Train Epoch: 10 [2000 / 60000 (3.333333%)] 1.563645 Train Epoch: 10 [4000 / 60000 (6.666667%)] 1.520124 Train Epoch: 10 [6000 / 60000 (10.000000%)] 1.497795 Train Epoch: 10 [8000 / 60000 (13.333333%)] 1.506054 Train Epoch: 10 [10000 / 60000 (16.666667%)] 1.519557 Train Epoch: 10 [12000 / 60000 (20.000000%)] 1.503181 Train Epoch: 10 [14000 / 60000 (23.333333%)] 1.517358 Train Epoch: 10 [16000 / 60000 (26.666667%)] 1.543413 Train Epoch: 10 [18000 / 60000 (30.000000%)] 1.521219 Train Epoch: 10 [20000 / 60000 (33.333333%)] 1.488172 Train Epoch: 10 [22000 / 60000 (36.666667%)] 1.508455 Train Epoch: 10 [24000 / 60000 (40.000000%)] 1.487978 Train Epoch: 10 [26000 / 60000 (43.333333%)] 1.532090 Train Epoch: 10 [28000 / 60000 (46.666667%)] 1.565693 Train Epoch: 10 [30000 / 60000 (50.000000%)] 1.497354 Train Epoch: 10 [32000 / 60000 (53.333333%)] 1.521956 Train Epoch: 10 [34000 / 60000 (56.666667%)] 1.539977 Train Epoch: 10 [36000 / 60000 (60.000000%)] 1.552825 Train Epoch: 10 [38000 / 60000 (63.333333%)] 1.509500 Train Epoch: 10 [40000 / 60000 (66.666667%)] 1.514931 Train Epoch: 10 [42000 / 60000 (70.000000%)] 1.604634 Train Epoch: 10 [44000 / 60000 (73.333333%)] 1.543068 Train Epoch: 10 [46000 / 60000 (76.666667%)] 1.518598 Train Epoch: 10 [48000 / 60000 (80.000000%)] 1.492715 Train Epoch: 10 [50000 / 60000 (83.333333%)] 1.523628 Train Epoch: 10 [52000 / 60000 (86.666667%)] 1.545275 Train Epoch: 10 [54000 / 60000 (90.000000%)] 1.510056 Train Epoch: 10 [56000 / 60000 (93.333333%)] 1.553573 Train Epoch: 10 [58000 / 60000 (96.666667%)] 1.520938 Test set: Average loss: 0.0149, Accuracy 9761/10000 (98% )
I'll provide a summary of what happened and the results for each epoch.
Epoch 1:
Epoch 2:
Epoch 3:
Epoch 4:
Epoch 5:
Epoch 6:
Epoch 7:
Epoch 8:
Epoch 9:
Epoch 10:
In summary, I trained the model for 10 epochs, and with each epoch, I observed improvements in the model's performance, both in terms of reduced loss and increased accuracy on the test set. This suggests that the model was learning and becoming better at the classification task with each epoch.
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[0]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 7
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[1]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 2
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[2]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 1
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[3]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 0
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[4]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 4
import matplotlib.pyplot as plt
model.eval()
data, target = test_data[15]
data = data.unsqueeze(0).to(device)
output = model(data)
prediction = output.argmax(dim=1, keepdim=True).item()
print(f"Prediction: {prediction}")
image = data.squeeze(0).squeeze(0).numpy()
plt.imshow(image);
Prediction: 5