In this tutorial i am going to explain following
- How to create load data into pytorch tensor for training
- How to define the model
- How to train the model
- How to save and load model
- How to do prediction using model
Note: In this tutorial our aim is to show you the whole workflow, because this is something fundamental we won't focus on bringing real life data, building some complex model architecture instead focus will be solely on showing you end to end work flow
Let's create toy dataset
import torch
# Check if CUDA is available and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create a 10x1 tensor with random values and move it to the appropriate device
X = torch.randn(100, 1, device=device)
# Compute y = 3X + 5
y = 3 * X + 5
pytorch imports
# torch.nn module contain functions that help in building neural networks
import torch.nn as nn
from torch.optim import SGD
from torch.utils.data import Dataset, DataLoader
import torch
Create pytorch dataset in format it expect the data
class ToyDataset(Dataset):
def __init__(self, x, y):
self.x = torch.tensor(x).float().to(device)
self.y = torch.tensor(y).float().to(device)
def __getitem__(self, ix):
return self.x[ix], self.y[ix]
def __len__(self):
return len(self.x)
ds = ToyDataset(X, y)
dl = DataLoader(ds, batch_size=2, shuffle=True)
Create model
model = nn.Sequential(
nn.Linear(1, 6),
nn.ReLU(),
nn.Linear(6, 1)
).to(device)
Define loss function and train model
# define the loss function that we need to optimize during training process
loss_func = nn.MSELoss()
opt = SGD(model.parameters(), lr = 0.001)
import time
loss_history = []
for _ in range(50):
for ix, iy in dl:
opt.zero_grad()
loss_value = loss_func(model(ix),iy)
loss_value.backward()
opt.step()
loss_history.append(loss_value.item())
To get model summary
from torchsummary import summary
summary(model, input_size=(1,))
make a prediction
val = torch.tensor([[1]]).float()
model(val.to(device))
Save model to disc (Only weights)
It is a good practice to save model after moving it to CPU device, why?
Not everyone have access to GPU, infact GPU has less crucial role during inferencing, there are use cases where CPU inferencing is fast enough even in production settings so if you save model after porting it to CPU, both kind of user later use it .
torch.save(model.to('cpu').state_dict(), '/content/drive/MyDrive/mymodel.pth')
Save model architecture + weights
# Save the entire model
torch.save(model.to('cpu'), '/content/drive/MyDrive/mymodel_full.pth')
loading entire model in eval mode for inferencing
import torch
# Load the entire model
model = torch.load('/content/drive/MyDrive/mymodel_full.pth')
model.eval() # Set the model to evaluation mode if needed
val = torch.tensor([[1]]).float()
model(val.to(device))
loading model weights only
# i am assuming model architecture is already defined in model varaible in your code
state_dict = torch.load('mymodel.pth')
model.load_state_dict(state_dict)
model.eval()
val = torch.tensor([[1]]).float()
model(val.to(device))
Warning: saving the weights+architecture cause problem when the PyTorch/Python version changes during loading time.