DL workshop on PyTorch

16th Apr - 19th April 8:30 - 10:30 PM
recording and colab will be provided
From home OPPE will be there before next term starts
Attendence is not mandatory
Instructor (Manojkumar Khara)
Drive Folder

Topics

Tensors, Pytorch popular Functions
ANN, Pipeline, DataLoader ,
CNN - image data
RNN - Text Data

LSTM , GRU , Encoder-Decoder, Transformers

PyTorch(Meta)—> Numpy, Research Paper, ChatGPT (pytorch), HuggingFace
Tensorflow(Google) —> Industrial

both NumPy arrays and tensors are multi-dimensional arrays

What Tensor can ? - GPU Acceleration - Automatic Differentiation

Pytorch - Tensor Computations - GPU Acceleration - Dynamic Computation Graph - Automatic Differentiation - Distributed Training

0 dim tensor: Scaler - Loss value

1 dim tensor : vector - Word Embedding

2 dim tensor : Matrix - Gray scale image

[[256, 0, 128],
[30, 64, 50]]

3 dim tensor : - RGB color image - (width, height, channels)

4 dim tensor : - A batch of RGB color image - (batch, width, height, channels)

5 dim tensor : - video data - Sequence of RGB color image (video) - (batch,frames, width, height, channels)

6 dim tensor : - Multiple video clips - (batch, clip, frames, width, height, channels)

Required tensors in DL: - Training data storage and representation - Parameters (Weights, biases) - Gradients (backpropagation) - Mathematical operations (add , matmul etc.)

References:

pin memory
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

# Check if GPU is available
import torch
# print(torch.cuda.is_available())
# torch.cuda.get_device_name(0)

torch.__version__

'2.6.0+cu124'

Module	Description
torch	The core module providing multidimensional arrays (tensors) and mathematical operations on them.
torch.autograd	Automatic differentiation engine that records operations on tensors to compute gradients for optimization.
torch.nn	Provides a neural networks library, including layers, activations, loss functions, and utilities to build deep learning models.
torch.optim	Contains optimization algorithms (optimizers) like SGD, Adam, and RMSprop used for training neural networks.
torch.utils.data	Utilities for data handling, including the Dataset and DataLoader classes for managing and loading datasets efficiently.
torch.distributed	Tools for distributed training across multiple GPUs and machines, facilitating parallel computation.
torch.cuda	Interfaces with NVIDIA CUDA to enable GPU acceleration for tensor computations and model training.
torch.multiprocessing	Utilities for parallelism using multiprocessing, similar to Python’s multiprocessing module but with support for CUDA tensors.
torch.quantization	Tools for model quantization to reduce model size and improve inference speed, especially on edge devices.
torch. onnx	Supports exporting PyTorch models to the ONNX (Open Neural Network Exchange) format for interoperability with other frameworks and deployment.

torchvision
torchtext
torchaudio

#@title Creating tensor
first_var = torch.tensor(2) # scalar tensor
first_var

tensor(2)

# .ndim
first_var.ndim

a = torch.tensor([2,3,4,5])
a

tensor([2, 3, 4, 5])

a.ndim

a.shape

torch.Size([4])

b = torch.tensor([[1,2,3],[3,4,5]])
b

tensor([[1, 2, 3],
        [3, 4, 5]])

b.ndim

b.shape

torch.Size([2, 3])

b.dtype

torch.int64

#check dtype
int, float, double, short, long

8, 16, 32, 64

a = torch.tensor([1.3,2.4,3], dtype = torch.float32)
a

tensor([1.3000, 2.4000, 3.0000])

a.dtype

torch.float32

# prompt: change the dtype of a to int32

b = torch.tensor(a, dtype = torch.int32)
b

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  b = torch.tensor(a, dtype = torch.int32)

tensor([1, 2, 3], dtype=torch.int32)

# empty
c= torch.empty((2,3), dtype=torch.int64)
c

tensor([[137191861376448,        10759048, 137187316199856],
        [             65,       171150544, 137191881579792]])

id(c)

137187256662352

#zeros
torch.zeros(3,4)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

#ones
torch.ones(4,4)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

#manual_seed
#rand
torch.manual_seed(42) # only for this cell
a = torch.rand(3,4) # initialize the weights
b = torch.rand(3,4) # initialize the weights
c = torch.rand(3,4) # initialize the weights

print(a)
print(b)
print(c)

tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8694, 0.5677, 0.7411, 0.4294],
        [0.8854, 0.5739, 0.2666, 0.6274],
        [0.2696, 0.4414, 0.2969, 0.8317]])
tensor([[0.1053, 0.2695, 0.3588, 0.1994],
        [0.5472, 0.0062, 0.9516, 0.0753],
        [0.8860, 0.5832, 0.3376, 0.8090]])

torch.manual_seed(43) # only for this cell
a = torch.rand(3,4) # initialize the weights
print(a)

tensor([[0.4540, 0.1965, 0.9210, 0.3462],
        [0.1481, 0.0858, 0.5909, 0.0659],
        [0.7476, 0.6253, 0.9392, 0.1338]])

# torch.manual_seed(42)
torch.rand(3,4) # initialize the weights

tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

#arange
torch.arange(0,10,2)

tensor([0, 2, 4, 6, 8])

#linspace
torch.linspace(5, 20, 4)

tensor([ 5., 10., 15., 20.])

#eye (identity matrix)
torch.eye(5)

tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])

#diag
torch.diag(torch.tensor([2,5,7,8]))

tensor([[2, 0, 0, 0],
        [0, 5, 0, 0],
        [0, 0, 7, 0],
        [0, 0, 0, 8]])

#full
torch.full((2,3), 3)

tensor([[3, 3, 3],
        [3, 3, 3]])

#dtypes

#int32

#int64 or long

#float32

#float64

Shape

#shape
t1 = torch.tensor([[2,3],[1,4],[5,6], [0,9]])
t1 # 4, 2

tensor([[2, 3],
        [1, 4],
        [5, 6],
        [0, 9]])

#dim
t1.dim()

#reshape
t1.reshape(2,2,2) #

tensor([[[2, 3],
         [1, 4]],

        [[5, 6],
         [0, 9]]])

#flatten
t1.flatten() #

tensor([2, 3, 1, 4, 5, 6, 0, 9])

mat = torch.rand(3,2,3,2)
mat

tensor([[[[0.4290, 0.9433],
          [0.3099, 0.7041],
          [0.5546, 0.9049]],

         [[0.8079, 0.1751],
          [0.4293, 0.4821],
          [0.3575, 0.8633]]],


        [[[0.9868, 0.0859],
          [0.3022, 0.9769],
          [0.5725, 0.3565]],

         [[0.8549, 0.4412],
          [0.3164, 0.0559],
          [0.8365, 0.3794]]],


        [[[0.1790, 0.1516],
          [0.1446, 0.6527],
          [0.6668, 0.5907]],

         [[0.1872, 0.0458],
          [0.2137, 0.1709],
          [0.6042, 0.6840]]]])

mat.flatten()

tensor([0.4290, 0.9433, 0.3099, 0.7041, 0.5546, 0.9049, 0.8079, 0.1751, 0.4293,
        0.4821, 0.3575, 0.8633, 0.9868, 0.0859, 0.3022, 0.9769, 0.5725, 0.3565,
        0.8549, 0.4412, 0.3164, 0.0559, 0.8365, 0.3794, 0.1790, 0.1516, 0.1446,
        0.6527, 0.6668, 0.5907, 0.1872, 0.0458, 0.2137, 0.1709, 0.6042, 0.6840])

#permute (2 x 3 x 4) --> (3 X 2 X 4)

#unsqueeze
#squeeze

mat = torch.rand(3,2,4) # 3 X 2 X 4
mat

tensor([[[0.3125, 0.0432, 0.8238, 0.7128],
         [0.3013, 0.3151, 0.4871, 0.2836]],

        [[0.7665, 0.5883, 0.2020, 0.0844],
         [0.3058, 0.1009, 0.4051, 0.3965]],

        [[0.2865, 0.5662, 0.2777, 0.7841],
         [0.7903, 0.3681, 0.2918, 0.0352]]])

torch.permute(mat, (2,0,1)) # 4 X 3 X 2

tensor([[[0.3125, 0.3013],
         [0.7665, 0.3058],
         [0.2865, 0.7903]],

        [[0.0432, 0.3151],
         [0.5883, 0.1009],
         [0.5662, 0.3681]],

        [[0.8238, 0.4871],
         [0.2020, 0.4051],
         [0.2777, 0.2918]],

        [[0.7128, 0.2836],
         [0.0844, 0.3965],
         [0.7841, 0.0352]]])

torch.unsqueeze(mat, dim = 2)

tensor([[[[0.3125, 0.0432, 0.8238, 0.7128]],

         [[0.3013, 0.3151, 0.4871, 0.2836]]],


        [[[0.7665, 0.5883, 0.2020, 0.0844]],

         [[0.3058, 0.1009, 0.4051, 0.3965]]],


        [[[0.2865, 0.5662, 0.2777, 0.7841]],

         [[0.7903, 0.3681, 0.2918, 0.0352]]]])

b = torch.rand(1,5,3,1)
b

tensor([[[[0.6473],
          [0.0379],
          [0.8159]],

         [[0.8852],
          [0.4247],
          [0.0985]],

         [[0.2929],
          [0.6726],
          [0.1386]],

         [[0.3796],
          [0.3385],
          [0.2939]],

         [[0.8167],
          [0.1573],
          [0.6731]]]])

c = torch.squeeze(b)
c

tensor([[0.6473, 0.0379, 0.8159],
        [0.8852, 0.4247, 0.0985],
        [0.2929, 0.6726, 0.1386],
        [0.3796, 0.3385, 0.2939],
        [0.8167, 0.1573, 0.6731]])

torch.unsqueeze(c, )

tensor([[[0.6473, 0.0379, 0.8159],
         [0.8852, 0.4247, 0.0985],
         [0.2929, 0.6726, 0.1386],
         [0.3796, 0.3385, 0.2939],
         [0.8167, 0.1573, 0.6731]]])

similiar shape tensors

#empty_like
t1 = torch.rand(2,3)
t1

tensor([[0.2141, 0.5516, 0.9147],
        [0.0856, 0.1362, 0.8033]])

t2 = torch.empty_like(t1)
t2

tensor([[8.1619e-33, 0.0000e+00, 3.0098e-32],
        [0.0000e+00, 8.9683e-44, 0.0000e+00]])

#zeros_like
t2 = torch.zeros_like(t1)
t2

tensor([[0., 0., 0.],
        [0., 0., 0.]])

#ones_like

#rand_like

Mathematical Operations

#scalar addition
a= torch.tensor(2)
b = torch.tensor(3)
a+b

tensor(5)

b + 5

tensor(8)

#scalar substraction
b - 1

tensor(2)

#scalar multiplication
b*2

tensor(6)

#scalar division /
b / 2

tensor(1.5000)

#scalar division //
b // 2

tensor(1)

# mod remainder
b % 2

tensor(1)

# power
b ** 2

tensor(9)

#elementwise operations

#add
#multipl
#abs
#neg
#round
#ceil
#floor
#clamp

t1 = torch.tensor([[-1,2,3], [4,5,-6]])
t2 = torch.tensor( [[1,1,1], [2,2,2]])

torch.add(t1,t2)

tensor([[ 0,  3,  4],
        [ 6,  7, -4]])

t1 + t2

tensor([[ 0,  3,  4],
        [ 6,  7, -4]])

t1 * t2

tensor([[ -1,   2,   3],
        [  8,  10, -12]])

torch.abs(t1)

tensor([[1, 2, 3],
        [4, 5, 6]])

torch.neg(t1)

tensor([[ 1, -2, -3],
        [-4, -5,  6]])

t3= torch.tensor([-5,-2, 0 ,1,2,3,4])
t3

tensor([-5, -2,  0,  1,  2,  3,  4])

torch.clamp(t3, min=0)

tensor([0, 0, 0, 1, 2, 3, 4])

stats

t1

tensor([[-1,  2,  3],
        [ 4,  5, -6]])

#sum
t1.sum()

tensor(7)

#sum along axis
t1.sum(axis= 0)

tensor([ 3,  7, -3])

#sum along axis
t1.sum(axis= 1)

tensor([4, 3])

#mean
t2 = torch.tensor([[1,2,4], [2,5,6]], dtype= torch.float)
t2.mean()

tensor(3.3333)

t2.mean(axis=1)

tensor([2.3333, 4.3333])

#median
t2.median()

tensor(2.)

#max
t2.max()

tensor(6.)

#min
t2.min()

tensor(1.)

#std
t2.std()

tensor(1.9664)

#argmax, argmin
t2.argmax()

tensor(5)

t2.argmin()

tensor(0)

matrix operation

 #matmul
 t1 = torch.tensor([[1,2,3], [4,5,6]]) #2X3
 t2 = torch.tensor([[1,2], [3,4], [5,6]]) # 3 x 2

torch.matmul(t1,t2)

tensor([[22, 28],
        [49, 64]])

#dot
t1 = torch.tensor([1,2,3])
t2 = torch.tensor([4,5,6])
torch.dot(t1,t2)

tensor(32)

#transpose
t1 = torch.tensor([[1,2,3], [4,5,6]])
torch.transpose(t1,0,1)

tensor([[1, 4],
        [2, 5],
        [3, 6]])

# determinant
# inverse

Comparison

#greater than
t1 = torch.tensor([[1,2], [3,4]])
t2 = torch.tensor([[5,1], [0,7]])
t1 > t2

tensor([[False,  True],
        [ True, False]])

t1 <  t2

tensor([[ True, False],
        [False,  True]])

#less than

#equal to

t1 == t2

tensor([[False, False],
        [False, False]])

important functions

#log
torch.log(t1)

tensor([[0.0000, 0.6931],
        [1.0986, 1.3863]])

#exp
torch.exp(t1)

tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5981]])

#square
torch.square(t1)

tensor([[ 1,  4],
        [ 9, 16]])

#sqrt
torch.sqrt(t1)

tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]])

#sigmoid # 1 / 1 + e^z
torch.sigmoid(t1)

tensor([[0.7311, 0.8808],
        [0.9526, 0.9820]])

def find_sigmoid(t):
  return 1 / (1 + torch.exp(-t))

find_sigmoid(t1)

tensor([[0.7311, 0.8808],
        [0.9526, 0.9820]])

#softmax
t1 = torch.tensor([[1,2], [3,4]], dtype= torch.float32)
torch.softmax(t1, axis= 1)

tensor([[0.2689, 0.7311],
        [0.2689, 0.7311]])

torch.softmax(t1, dim= 1)

tensor([[0.2689, 0.7311],
        [0.2689, 0.7311]])

#relu
t1 = torch.tensor([[-1,2], [3,-4]], dtype= torch.float32)
torch.relu(t1)

tensor([[0., 2.],
        [3., 0.]])

Inplace operation

a = torch.tensor([1,2,3])
b = torch.tensor([4,5,6])
c = a+b

a.add_(b)

w = torch.tensor([4,5,6])
grad = torch.tensor([2,1,3])

w.subtract_(grad)
w

tensor([2, 4, 3])

w.zero_()

tensor([0, 0, 0])

clone

#clone
a = torch.tensor([1,2,3])
b = a
b

tensor([1, 2, 3])

id(a)

137187252920080

id(b)

137187252920080

b.add_(1)

tensor([2, 3, 4])

tensor([2, 3, 4])

c= torch.clone(a)
c

tensor([2, 3, 4])

id(c)

137187252921424

c.add_(1)

tensor([3, 4, 5])

tensor([2, 3, 4])

Tensor operations on GPU

import torch

torch.cuda.device_count()

torch.cuda.get_device_properties(0)

_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15095MB, multi_processor_count=40, uuid=ee6f7a89-4693-5b16-c869-5fe163a44485, L2_cache_size=4MB)

# Check if CUDA is available
if torch.cuda.is_available():
    gpu_count = torch.cuda.device_count()
    print(f"Number of GPUs available: {gpu_count}\n")

    # Iterate through all GPUs
    for i in range(gpu_count):
        props = torch.cuda.get_device_properties(i)
        print(f"GPU {i}: {props.name}")
        print(f"  - Total Memory: {props.total_memory / 1e9:.2f} GB")
        print(f"  - Compute Capability: {props.major}.{props.minor}")
        print(f"  - Multi-Processor Count: {props.multi_processor_count}\n")
else:
    print("No GPUs available.")

Number of GPUs available: 1

GPU 0: Tesla T4
  - Total Memory: 15.83 GB
  - Compute Capability: 7.5
  - Multi-Processor Count: 40

torch.device(0)

device(type='cuda', index=0)

torch.device(type='cuda')

device(type='cuda')

torch.device(type='cuda',index=0)

device(type='cuda', index=0)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

# create matrix in GPU
torch.tensor([1,2,3], device = device)

tensor([1, 2, 3], device='cuda:0')

torch.tensor([1,2,3])

tensor([1, 2, 3])

# move a matrix from CPU to GPU
mat1 = torch.tensor([1,2,3])
mat1

tensor([1, 2, 3])

mat1.to(device)

tensor([1, 2, 3], device='cuda:0')

# move model to GPU
model = torch.nn.Linear(10,1)
model.to(device)

Linear(in_features=10, out_features=1, bias=True)

From numpy to torch

From pandas to torch

AutoGrad

\[ q_1(z) = z^2\\ q_2(z) = z^3\\ q_3(z) = e^z\\ p(z) = \dfrac{q_1}{q_2} + q_3\\ = \dfrac{z^2}{z^3} + e^z = \dfrac{1}{z} + e^z\\ \dfrac{\partial p(z)}{\partial z} = \dfrac{-1}{z^2} + e^z \]

derivation along multiple path

\(\dfrac{\partial p(z)}{\partial z} = \dfrac{\partial p}{\partial q1}\dfrac{\partial q1}{\partial z}+ \dfrac{\partial p}{\partial q2}\dfrac{\partial q2}{\partial z}+\dfrac{\partial p}{\partial q3}\dfrac{\partial q3}{\partial z}\)

= \((\dfrac{1}{q_2}) * 2z + (\dfrac{-q_1}{q_2^2}) * 3z^2 + (1) * e^z\)

= \(\dfrac{2}{z^2} - \dfrac{3}{z^2} + e^z\)

= \(\dfrac{-1}{z^2} + e^z\)

from IPython.display import IFrame

IFrame(src="https://iitm-pod.slides.com/arunprakash_ai/cs6910-lecture-4/fullscreen#/0/43", width=800, height=600)

Gradient Accumulation and zero_()

disable gradient tracking

while updating the gradients
predicting after the model is trained

Example

Up Next

Training Pipeline
- Define Model
- for epoch in range(epochs):
  - Forward pass
  - Loss calculation
  - Backward pass
  - Parameters update
Model Evaluation
Improve Training Pipeline using nn.Module and torch.optim
- nn.Linear
- Activation Functions(nn.ReLU, nn.Sigmoid, nn.Softmax)
- nn.Sequential Container
- Loss Functions (nn.BCELoss, nn.CrossEntropyLoss etc.)
torch.optim (SGD,ADAM etc)
Improve training pipeline using torch.utils.data Dataset and DataLoader
- Data Loading
- batching
- shuffling sampling
- Parallelization (num_workers)