PyTorch in a sense was the more accomplished and more loved younger sibling (as proven by the trend of papers and repositories built on each framework).
Praises on Reddit and StackOverflow were never-ending. It had better libraries, better CPU and GPU support, and easier documentation. Of course, I wanted a slice of that PyTorch pie, so I dived into it again with another project.
I got started with a project which involved the Detection of AI-Generated Sneakers using PyTorch. Boy, oh boy. PyTorch felt severely different from TensorFlow. I felt like I was chasing a hand to hold rather than getting my hand held. I wanted that oh, so gentle touch back. But I had to push forward.
What I noticed immediately though was the convenience of having CUDA immediately in my arsenal despite being on a Windows machine. On TensorFlow, I had to downgrade my TensorFlow version and use WSL (Windows Subsystem for Linux) just to enable GPU training. That process alone took around 2–3 hours of trial and error, and I was glad that it was out of the way immediately.
Creating the neural network was also sort of intuitive. But what I found difficult was having to calculate output shapes manually and defining my own forward pass.
class CNN(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3), # 238 x 238 x 32
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 32, kernel_size=3), # 236 x 236 x 32
nn.ReLU(),
nn.BatchNorm2d(32),
nn.MaxPool2d(kernel_size=2, stride=2), # 118 x 118 x 32
nn.Conv2d(32, 64, kernel_size=3), # 116 x 116 x 64
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Conv2d(64, 64, kernel_size=3), # 114 x 114 x 64
nn.ReLU(),
nn.BatchNorm2d(64),
nn.MaxPool2d(kernel_size=2, stride=2), # 57 x 57 x 64
nn.Conv2d(64, 128, kernel_size=3), # 55 x 55 x 128
nn.ReLU(),
nn.BatchNorm2d(128),
nn.Conv2d(128, 128, kernel_size=3), # 53 x 53 x 128
nn.ReLU(),
nn.BatchNorm2d(128),
nn.MaxPool2d(kernel_size=2, stride=2), # 26 x 26 x 128
nn.Flatten(),
nn.Linear(in_features=26*26*128, out_features=512),
nn.ReLU(),
nn.Linear(512, 2),
)def forward(self, x):
x = self.layers(x)
return x
model = CNN()
model.to(device)
The hard part was actually being slapped by the reality of creating my own training, validation, and testing loops. Getting used to model.compile()
and model.fit()
was a luxury, and now I was faced with the challenge of making my own. Not going to lie, ChatGPT helped me a lot with this process. Nothing like using AI to create AI amirite?
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
num_batches = len(dataloader)
# Set the model to training mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.train()
train_loss, correct = 0, 0
for batch, (X, y) in enumerate(dataloader):
X, y = X.float().to(device), y.long().to(device)
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)train_loss += loss.item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 10 == 0:
loss, current = loss.item(), batch * batch_size + len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
average_train_loss = train_loss / num_batches
train_accuracy = correct / size
print(
f"Training Error: \n Accuracy: {(100*train_accuracy):>0.1f}%, Avg loss: {average_train_loss:>8f} \n"
)
return average_train_loss, train_accuracy
def val_loop(dataloader, model, loss_fn):
# Set the model to evaluation mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
val_loss, correct = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
X, y = X.float().to(device), y.long().to(device)
pred = model(X)
val_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
average_val_loss = val_loss / num_batches
val_accuracy = correct / size
print(
f"Validation Error: \n Accuracy: {(100*val_accuracy):>0.1f}%, Validation loss: {average_val_loss:>8f} \n"
)
return average_val_loss, val_accuracy
def evaluate_model(loader, model, loss_fn):
model.eval()
y_true = []
y_pred = []
total_loss = 0
correct_examples = []
incorrect_examples = []
with torch.no_grad():
for X, y in loader:
X, y = X.float().to(device), y.long().to(device)
outputs = model(X)
loss = loss_fn(outputs, y)
total_loss += loss.item()
_, predicted = torch.max(outputs, 1)
y_true.extend(y.tolist())
y_pred.extend(predicted.tolist())
matches = predicted == y
for i in range(len(matches)):
example = (X[i].cpu(), y[i].item(), predicted[i].item()) # Store tensor as CPU tensor, labels as items
if matches[i]:
correct_examples.append(example)
else:
incorrect_examples.append(example)
average_loss = total_loss / len(loader)
accuracy = (np.array(y_true) == np.array(y_pred)).mean()
return y_true, y_pred, average_loss, accuracy, correct_examples, incorrect_examples
learning_rate = 0.001
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
train_losses = []
test_losses = []
val_losses = []
train_accs = []
test_accs = []
val_accs = []
epochs = 10
max_acc = 0
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loss, train_acc = train_loop(train_loader, model, loss_fn, optimizer)
val_loss, val_acc = val_loop(val_loader, model, loss_fn)
train_losses.append(train_loss)
train_accs.append(train_acc)
val_losses.append(val_loss)
val_accs.append(val_acc)
if max_acc print(
f"[SAVING] Validation Accuracy Increased({(100*max_acc):>0.1f}% ---> {(100*val_acc):>0.1f}%)"
)
max_acc = val_acc
# Saving State Dict
torch.save(model.state_dict(), "/kaggle/working/saved_model.pth")
print("Done!")
Now that’s a lot of lines but I could get where seasoned engineers are coming from. Creating your own training, validation, and testing loops in PyTorch provides greater flexibility and control over the training process, allowing for custom behaviors, more intricate debugging, and fine-tuning that isn’t as easily achievable with the more abstracted model.compile()
and model.fit()
methods in TensorFlow. I’m still far but I’m getting there.
In the end, the model managed to achieve 96.11%
validation accuracy and 0.3328
validation loss after 10 epochs of training with a 32-batch size.
Be the first to comment