Open In Colab

DL workshop on PyTorch

  • 16th Apr - 19th April 8:30 - 10:30 PM
  • recording and colab will be provided
  • From home OPPE will be there before next term starts
  • Attendence is not mandatory
  • Instructor (Manojkumar Khara)
  • Drive Folder

Topics

  • Tensors, Pytorch popular Functions
  • ANN, Pipeline, DataLoader ,
  • CNN - image data
  • RNN - Text Data

LSTM , GRU , Encoder-Decoder, Transformers

  • PyTorch(Meta)—> Numpy, Research Paper, ChatGPT (pytorch), HuggingFace
  • Tensorflow(Google) —> Industrial

image.png

both NumPy arrays and tensors are multi-dimensional arrays

What Tensor can ? - GPU Acceleration - Automatic Differentiation

Pytorch - Tensor Computations - GPU Acceleration - Dynamic Computation Graph - Automatic Differentiation - Distributed Training

0 dim tensor: Scaler - Loss value

1 dim tensor : vector - Word Embedding

2 dim tensor : Matrix - Gray scale image

[[256, 0, 128],
[30, 64, 50]]

3 dim tensor : - RGB color image - (width, height, channels)

4 dim tensor : - A batch of RGB color image - (batch, width, height, channels)

5 dim tensor : - video data - Sequence of RGB color image (video) - (batch,frames, width, height, channels)

6 dim tensor : - Multiple video clips - (batch, clip, frames, width, height, channels)

Required tensors in DL: - Training data storage and representation - Parameters (Weights, biases) - Gradients (backpropagation) - Mathematical operations (add , matmul etc.)

References:

  • pin memory
  • CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
# Check if GPU is available
import torch
# print(torch.cuda.is_available())
# torch.cuda.get_device_name(0)
torch.__version__
'2.6.0+cu124'
Module Description
torch The core module providing multidimensional arrays (tensors) and mathematical operations on them.
torch.autograd Automatic differentiation engine that records operations on tensors to compute gradients for optimization.
torch.nn Provides a neural networks library, including layers, activations, loss functions, and utilities to build deep learning models.
torch.optim Contains optimization algorithms (optimizers) like SGD, Adam, and RMSprop used for training neural networks.
torch.utils.data Utilities for data handling, including the Dataset and DataLoader classes for managing and loading datasets efficiently.
torch.distributed Tools for distributed training across multiple GPUs and machines, facilitating parallel computation.
torch.cuda Interfaces with NVIDIA CUDA to enable GPU acceleration for tensor computations and model training.
torch.multiprocessing Utilities for parallelism using multiprocessing, similar to Python’s multiprocessing module but with support for CUDA tensors.
torch.quantization Tools for model quantization to reduce model size and improve inference speed, especially on edge devices.
torch. onnx Supports exporting PyTorch models to the ONNX (Open Neural Network Exchange) format for interoperability with other frameworks and deployment.

torchvision
torchtext
torchaudio

#@title Creating tensor
first_var = torch.tensor(2) # scalar tensor
first_var
tensor(2)
# .ndim
first_var.ndim
0
a = torch.tensor([2,3,4,5])
a
tensor([2, 3, 4, 5])
a.ndim
1
a.shape
torch.Size([4])
b = torch.tensor([[1,2,3],[3,4,5]])
b
tensor([[1, 2, 3],
        [3, 4, 5]])
b.ndim
2
b.shape
torch.Size([2, 3])
b.dtype
torch.int64
#check dtype
int, float, double, short, long

8, 16, 32, 64
a = torch.tensor([1.3,2.4,3], dtype = torch.float32)
a
tensor([1.3000, 2.4000, 3.0000])
a.dtype
torch.float32
# prompt: change the dtype of a to int32

b = torch.tensor(a, dtype = torch.int32)
b
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  b = torch.tensor(a, dtype = torch.int32)
tensor([1, 2, 3], dtype=torch.int32)
# empty
c= torch.empty((2,3), dtype=torch.int64)
c
tensor([[137191861376448,        10759048, 137187316199856],
        [             65,       171150544, 137191881579792]])
id(c)
137187256662352
#zeros
torch.zeros(3,4)
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
#ones
torch.ones(4,4)
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
#manual_seed
#rand
torch.manual_seed(42) # only for this cell
a = torch.rand(3,4) # initialize the weights
b = torch.rand(3,4) # initialize the weights
c = torch.rand(3,4) # initialize the weights

print(a)
print(b)
print(c)
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8694, 0.5677, 0.7411, 0.4294],
        [0.8854, 0.5739, 0.2666, 0.6274],
        [0.2696, 0.4414, 0.2969, 0.8317]])
tensor([[0.1053, 0.2695, 0.3588, 0.1994],
        [0.5472, 0.0062, 0.9516, 0.0753],
        [0.8860, 0.5832, 0.3376, 0.8090]])
torch.manual_seed(43) # only for this cell
a = torch.rand(3,4) # initialize the weights
print(a)
tensor([[0.4540, 0.1965, 0.9210, 0.3462],
        [0.1481, 0.0858, 0.5909, 0.0659],
        [0.7476, 0.6253, 0.9392, 0.1338]])
# torch.manual_seed(42)
torch.rand(3,4) # initialize the weights
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
#arange
torch.arange(0,10,2)
tensor([0, 2, 4, 6, 8])
#linspace
torch.linspace(5, 20, 4)
tensor([ 5., 10., 15., 20.])
#eye (identity matrix)
torch.eye(5)
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
#diag
torch.diag(torch.tensor([2,5,7,8]))
tensor([[2, 0, 0, 0],
        [0, 5, 0, 0],
        [0, 0, 7, 0],
        [0, 0, 0, 8]])
#full
torch.full((2,3), 3)
tensor([[3, 3, 3],
        [3, 3, 3]])

#dtypes

#int32
#int64 or long
#float32
#float64

Shape

#shape
t1 = torch.tensor([[2,3],[1,4],[5,6], [0,9]])
t1 # 4, 2
tensor([[2, 3],
        [1, 4],
        [5, 6],
        [0, 9]])
#dim
t1.dim()
2
#reshape
t1.reshape(2,2,2) #
tensor([[[2, 3],
         [1, 4]],

        [[5, 6],
         [0, 9]]])
#flatten
t1.flatten() #
tensor([2, 3, 1, 4, 5, 6, 0, 9])
mat = torch.rand(3,2,3,2)
mat
tensor([[[[0.4290, 0.9433],
          [0.3099, 0.7041],
          [0.5546, 0.9049]],

         [[0.8079, 0.1751],
          [0.4293, 0.4821],
          [0.3575, 0.8633]]],


        [[[0.9868, 0.0859],
          [0.3022, 0.9769],
          [0.5725, 0.3565]],

         [[0.8549, 0.4412],
          [0.3164, 0.0559],
          [0.8365, 0.3794]]],


        [[[0.1790, 0.1516],
          [0.1446, 0.6527],
          [0.6668, 0.5907]],

         [[0.1872, 0.0458],
          [0.2137, 0.1709],
          [0.6042, 0.6840]]]])
mat.flatten()
tensor([0.4290, 0.9433, 0.3099, 0.7041, 0.5546, 0.9049, 0.8079, 0.1751, 0.4293,
        0.4821, 0.3575, 0.8633, 0.9868, 0.0859, 0.3022, 0.9769, 0.5725, 0.3565,
        0.8549, 0.4412, 0.3164, 0.0559, 0.8365, 0.3794, 0.1790, 0.1516, 0.1446,
        0.6527, 0.6668, 0.5907, 0.1872, 0.0458, 0.2137, 0.1709, 0.6042, 0.6840])
#permute (2 x 3 x 4) --> (3 X 2 X 4)

#unsqueeze
#squeeze
mat = torch.rand(3,2,4) # 3 X 2 X 4
mat
tensor([[[0.3125, 0.0432, 0.8238, 0.7128],
         [0.3013, 0.3151, 0.4871, 0.2836]],

        [[0.7665, 0.5883, 0.2020, 0.0844],
         [0.3058, 0.1009, 0.4051, 0.3965]],

        [[0.2865, 0.5662, 0.2777, 0.7841],
         [0.7903, 0.3681, 0.2918, 0.0352]]])
torch.permute(mat, (2,0,1)) # 4 X 3 X 2
tensor([[[0.3125, 0.3013],
         [0.7665, 0.3058],
         [0.2865, 0.7903]],

        [[0.0432, 0.3151],
         [0.5883, 0.1009],
         [0.5662, 0.3681]],

        [[0.8238, 0.4871],
         [0.2020, 0.4051],
         [0.2777, 0.2918]],

        [[0.7128, 0.2836],
         [0.0844, 0.3965],
         [0.7841, 0.0352]]])
torch.unsqueeze(mat, dim = 2)
tensor([[[[0.3125, 0.0432, 0.8238, 0.7128]],

         [[0.3013, 0.3151, 0.4871, 0.2836]]],


        [[[0.7665, 0.5883, 0.2020, 0.0844]],

         [[0.3058, 0.1009, 0.4051, 0.3965]]],


        [[[0.2865, 0.5662, 0.2777, 0.7841]],

         [[0.7903, 0.3681, 0.2918, 0.0352]]]])
b = torch.rand(1,5,3,1)
b
tensor([[[[0.6473],
          [0.0379],
          [0.8159]],

         [[0.8852],
          [0.4247],
          [0.0985]],

         [[0.2929],
          [0.6726],
          [0.1386]],

         [[0.3796],
          [0.3385],
          [0.2939]],

         [[0.8167],
          [0.1573],
          [0.6731]]]])
c = torch.squeeze(b)
c
tensor([[0.6473, 0.0379, 0.8159],
        [0.8852, 0.4247, 0.0985],
        [0.2929, 0.6726, 0.1386],
        [0.3796, 0.3385, 0.2939],
        [0.8167, 0.1573, 0.6731]])
torch.unsqueeze(c, )
tensor([[[0.6473, 0.0379, 0.8159],
         [0.8852, 0.4247, 0.0985],
         [0.2929, 0.6726, 0.1386],
         [0.3796, 0.3385, 0.2939],
         [0.8167, 0.1573, 0.6731]]])

similiar shape tensors

#empty_like
t1 = torch.rand(2,3)
t1
tensor([[0.2141, 0.5516, 0.9147],
        [0.0856, 0.1362, 0.8033]])
t2 = torch.empty_like(t1)
t2
tensor([[8.1619e-33, 0.0000e+00, 3.0098e-32],
        [0.0000e+00, 8.9683e-44, 0.0000e+00]])
#zeros_like
t2 = torch.zeros_like(t1)
t2
tensor([[0., 0., 0.],
        [0., 0., 0.]])
#ones_like
#rand_like

Mathematical Operations

#scalar addition
a= torch.tensor(2)
b = torch.tensor(3)
a+b
tensor(5)
b + 5
tensor(8)
#scalar substraction
b - 1
tensor(2)
#scalar multiplication
b*2
tensor(6)
#scalar division /
b / 2
tensor(1.5000)
#scalar division //
b // 2
tensor(1)
# mod remainder
b % 2
tensor(1)
# power
b ** 2
tensor(9)

#elementwise operations

#add
#multipl
#abs
#neg
#round
#ceil
#floor
#clamp
t1 = torch.tensor([[-1,2,3], [4,5,-6]])
t2 = torch.tensor( [[1,1,1], [2,2,2]])
torch.add(t1,t2)
tensor([[ 0,  3,  4],
        [ 6,  7, -4]])
t1 + t2
tensor([[ 0,  3,  4],
        [ 6,  7, -4]])
t1 * t2
tensor([[ -1,   2,   3],
        [  8,  10, -12]])
torch.abs(t1)
tensor([[1, 2, 3],
        [4, 5, 6]])
torch.neg(t1)
tensor([[ 1, -2, -3],
        [-4, -5,  6]])
t3= torch.tensor([-5,-2, 0 ,1,2,3,4])
t3
tensor([-5, -2,  0,  1,  2,  3,  4])
torch.clamp(t3, min=0)
tensor([0, 0, 0, 1, 2, 3, 4])

stats

t1
tensor([[-1,  2,  3],
        [ 4,  5, -6]])
#sum
t1.sum()
tensor(7)
#sum along axis
t1.sum(axis= 0)
tensor([ 3,  7, -3])
#sum along axis
t1.sum(axis= 1)
tensor([4, 3])
#mean
t2 = torch.tensor([[1,2,4], [2,5,6]], dtype= torch.float)
t2.mean()
tensor(3.3333)
t2.mean(axis=1)
tensor([2.3333, 4.3333])
#median
t2.median()
tensor(2.)
#max
t2.max()
tensor(6.)
#min
t2.min()
tensor(1.)
#std
t2.std()
tensor(1.9664)
#argmax, argmin
t2.argmax()
tensor(5)
t2.argmin()
tensor(0)

matrix operation

 #matmul
 t1 = torch.tensor([[1,2,3], [4,5,6]]) #2X3
 t2 = torch.tensor([[1,2], [3,4], [5,6]]) # 3 x 2
torch.matmul(t1,t2)
tensor([[22, 28],
        [49, 64]])
#dot
t1 = torch.tensor([1,2,3])
t2 = torch.tensor([4,5,6])
torch.dot(t1,t2)
tensor(32)
#transpose
t1 = torch.tensor([[1,2,3], [4,5,6]])
torch.transpose(t1,0,1)
tensor([[1, 4],
        [2, 5],
        [3, 6]])
# determinant
# inverse

Comparison

#greater than
t1 = torch.tensor([[1,2], [3,4]])
t2 = torch.tensor([[5,1], [0,7]])
t1 > t2
tensor([[False,  True],
        [ True, False]])
t1 <  t2
tensor([[ True, False],
        [False,  True]])
#less than
#equal to

t1 == t2
tensor([[False, False],
        [False, False]])

important functions

#log
torch.log(t1)
tensor([[0.0000, 0.6931],
        [1.0986, 1.3863]])
#exp
torch.exp(t1)
tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5981]])
#square
torch.square(t1)
tensor([[ 1,  4],
        [ 9, 16]])
#sqrt
torch.sqrt(t1)
tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]])
#sigmoid # 1 / 1 + e^z
torch.sigmoid(t1)
tensor([[0.7311, 0.8808],
        [0.9526, 0.9820]])
def find_sigmoid(t):
  return 1 / (1 + torch.exp(-t))

find_sigmoid(t1)
tensor([[0.7311, 0.8808],
        [0.9526, 0.9820]])
#softmax
t1 = torch.tensor([[1,2], [3,4]], dtype= torch.float32)
torch.softmax(t1, axis= 1)
tensor([[0.2689, 0.7311],
        [0.2689, 0.7311]])
torch.softmax(t1, dim= 1)
tensor([[0.2689, 0.7311],
        [0.2689, 0.7311]])
#relu
t1 = torch.tensor([[-1,2], [3,-4]], dtype= torch.float32)
torch.relu(t1)
tensor([[0., 2.],
        [3., 0.]])

Inplace operation

a = torch.tensor([1,2,3])
b = torch.tensor([4,5,6])
c = a+b
a.add_(b)
w = torch.tensor([4,5,6])
grad = torch.tensor([2,1,3])

w.subtract_(grad)
w
tensor([2, 4, 3])
w.zero_()
tensor([0, 0, 0])

clone

#clone
a = torch.tensor([1,2,3])
b = a
b
tensor([1, 2, 3])
id(a)
137187252920080
id(b)
137187252920080
b.add_(1)
tensor([2, 3, 4])
a
tensor([2, 3, 4])
c= torch.clone(a)
c
tensor([2, 3, 4])
id(c)
137187252921424
c.add_(1)
tensor([3, 4, 5])
a
tensor([2, 3, 4])

Tensor operations on GPU

import torch
torch.cuda.device_count()
1
torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15095MB, multi_processor_count=40, uuid=ee6f7a89-4693-5b16-c869-5fe163a44485, L2_cache_size=4MB)
# Check if CUDA is available
if torch.cuda.is_available():
    gpu_count = torch.cuda.device_count()
    print(f"Number of GPUs available: {gpu_count}\n")

    # Iterate through all GPUs
    for i in range(gpu_count):
        props = torch.cuda.get_device_properties(i)
        print(f"GPU {i}: {props.name}")
        print(f"  - Total Memory: {props.total_memory / 1e9:.2f} GB")
        print(f"  - Compute Capability: {props.major}.{props.minor}")
        print(f"  - Multi-Processor Count: {props.multi_processor_count}\n")
else:
    print("No GPUs available.")
Number of GPUs available: 1

GPU 0: Tesla T4
  - Total Memory: 15.83 GB
  - Compute Capability: 7.5
  - Multi-Processor Count: 40
torch.device(0)
device(type='cuda', index=0)
torch.device(type='cuda')
device(type='cuda')
torch.device(type='cuda',index=0)
device(type='cuda', index=0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda')
# create matrix in GPU
torch.tensor([1,2,3], device = device)
tensor([1, 2, 3], device='cuda:0')
torch.tensor([1,2,3])
tensor([1, 2, 3])
# move a matrix from CPU to GPU
mat1 = torch.tensor([1,2,3])
mat1
tensor([1, 2, 3])
mat1.to(device)
tensor([1, 2, 3], device='cuda:0')
# move model to GPU
model = torch.nn.Linear(10,1)
model.to(device)
Linear(in_features=10, out_features=1, bias=True)

From numpy to torch

From pandas to torch

AutoGrad

image.png

\[ q_1(z) = z^2\\ q_2(z) = z^3\\ q_3(z) = e^z\\ p(z) = \dfrac{q_1}{q_2} + q_3\\ = \dfrac{z^2}{z^3} + e^z = \dfrac{1}{z} + e^z\\ \dfrac{\partial p(z)}{\partial z} = \dfrac{-1}{z^2} + e^z \]

derivation along multiple path

\(\dfrac{\partial p(z)}{\partial z} = \dfrac{\partial p}{\partial q1}\dfrac{\partial q1}{\partial z}+ \dfrac{\partial p}{\partial q2}\dfrac{\partial q2}{\partial z}+\dfrac{\partial p}{\partial q3}\dfrac{\partial q3}{\partial z}\)

= \((\dfrac{1}{q_2}) * 2z + (\dfrac{-q_1}{q_2^2}) * 3z^2 + (1) * e^z\)

= \(\dfrac{2}{z^2} - \dfrac{3}{z^2} + e^z\)

= \(\dfrac{-1}{z^2} + e^z\)

from IPython.display import IFrame

IFrame(src="https://iitm-pod.slides.com/arunprakash_ai/cs6910-lecture-4/fullscreen#/0/43", width=800, height=600)

Gradient Accumulation and zero_()

disable gradient tracking

  • while updating the gradients
  • predicting after the model is trained

Example

Up Next

  • Training Pipeline

    • Define Model
    • for epoch in range(epochs):
      • Forward pass
      • Loss calculation
      • Backward pass
      • Parameters update
  • Model Evaluation

  • Improve Training Pipeline using nn.Module and torch.optim

    • nn.Linear
    • Activation Functions(nn.ReLU, nn.Sigmoid, nn.Softmax)
    • nn.Sequential Container
    • Loss Functions (nn.BCELoss, nn.CrossEntropyLoss etc.)
  • torch.optim (SGD,ADAM etc)

  • Improve training pipeline using torch.utils.data Dataset and DataLoader

    • Data Loading
    • batching
    • shuffling sampling
    • Parallelization (num_workers)