DL workshop on PyTorch
- 16th Apr - 19th April 8:30 - 10:30 PM
- recording and colab will be provided
- From home OPPE will be there before next term starts
- Attendence is not mandatory
- Instructor (Manojkumar Khara)
- Drive Folder
Topics
- Tensors, Pytorch popular Functions
- ANN, Pipeline, DataLoader ,
- CNN - image data
- RNN - Text Data
LSTM , GRU , Encoder-Decoder, Transformers
- PyTorch(Meta)—> Numpy, Research Paper, ChatGPT (pytorch), HuggingFace
- Tensorflow(Google) —> Industrial
both NumPy arrays and tensors are multi-dimensional arrays
What Tensor can ? - GPU Acceleration - Automatic Differentiation
Pytorch - Tensor Computations - GPU Acceleration - Dynamic Computation Graph - Automatic Differentiation - Distributed Training
0 dim tensor: Scaler - Loss value
1 dim tensor : vector - Word Embedding
2 dim tensor : Matrix - Gray scale image
[[256, 0, 128],
[30, 64, 50]]
3 dim tensor : - RGB color image - (width, height, channels)
4 dim tensor : - A batch of RGB color image - (batch, width, height, channels)
5 dim tensor : - video data - Sequence of RGB color image (video) - (batch,frames, width, height, channels)
6 dim tensor : - Multiple video clips - (batch, clip, frames, width, height, channels)
Required tensors in DL: - Training data storage and representation - Parameters (Weights, biases) - Gradients (backpropagation) - Mathematical operations (add , matmul etc.)
References:
- pin memory
- CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
| Module | Description |
|---|---|
| torch | The core module providing multidimensional arrays (tensors) and mathematical operations on them. |
| torch.autograd | Automatic differentiation engine that records operations on tensors to compute gradients for optimization. |
| torch.nn | Provides a neural networks library, including layers, activations, loss functions, and utilities to build deep learning models. |
| torch.optim | Contains optimization algorithms (optimizers) like SGD, Adam, and RMSprop used for training neural networks. |
| torch.utils.data | Utilities for data handling, including the Dataset and DataLoader classes for managing and loading datasets efficiently. |
| torch.distributed | Tools for distributed training across multiple GPUs and machines, facilitating parallel computation. |
| torch.cuda | Interfaces with NVIDIA CUDA to enable GPU acceleration for tensor computations and model training. |
| torch.multiprocessing | Utilities for parallelism using multiprocessing, similar to Python’s multiprocessing module but with support for CUDA tensors. |
| torch.quantization | Tools for model quantization to reduce model size and improve inference speed, especially on edge devices. |
| torch. onnx | Supports exporting PyTorch models to the ONNX (Open Neural Network Exchange) format for interoperability with other frameworks and deployment. |
torchvision
torchtext
torchaudio
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
b = torch.tensor(a, dtype = torch.int32)
tensor([1, 2, 3], dtype=torch.int32)
tensor([[137191861376448, 10759048, 137187316199856],
[ 65, 171150544, 137191881579792]])
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
#manual_seed
#rand
torch.manual_seed(42) # only for this cell
a = torch.rand(3,4) # initialize the weights
b = torch.rand(3,4) # initialize the weights
c = torch.rand(3,4) # initialize the weights
print(a)
print(b)
print(c)tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8694, 0.5677, 0.7411, 0.4294],
[0.8854, 0.5739, 0.2666, 0.6274],
[0.2696, 0.4414, 0.2969, 0.8317]])
tensor([[0.1053, 0.2695, 0.3588, 0.1994],
[0.5472, 0.0062, 0.9516, 0.0753],
[0.8860, 0.5832, 0.3376, 0.8090]])
tensor([[0.4540, 0.1965, 0.9210, 0.3462],
[0.1481, 0.0858, 0.5909, 0.0659],
[0.7476, 0.6253, 0.9392, 0.1338]])
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
[0.3904, 0.6009, 0.2566, 0.7936],
[0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
tensor([[2, 0, 0, 0],
[0, 5, 0, 0],
[0, 0, 7, 0],
[0, 0, 0, 8]])
#dtypes
Shape
tensor([[2, 3],
[1, 4],
[5, 6],
[0, 9]])
tensor([[[[0.4290, 0.9433],
[0.3099, 0.7041],
[0.5546, 0.9049]],
[[0.8079, 0.1751],
[0.4293, 0.4821],
[0.3575, 0.8633]]],
[[[0.9868, 0.0859],
[0.3022, 0.9769],
[0.5725, 0.3565]],
[[0.8549, 0.4412],
[0.3164, 0.0559],
[0.8365, 0.3794]]],
[[[0.1790, 0.1516],
[0.1446, 0.6527],
[0.6668, 0.5907]],
[[0.1872, 0.0458],
[0.2137, 0.1709],
[0.6042, 0.6840]]]])
tensor([0.4290, 0.9433, 0.3099, 0.7041, 0.5546, 0.9049, 0.8079, 0.1751, 0.4293,
0.4821, 0.3575, 0.8633, 0.9868, 0.0859, 0.3022, 0.9769, 0.5725, 0.3565,
0.8549, 0.4412, 0.3164, 0.0559, 0.8365, 0.3794, 0.1790, 0.1516, 0.1446,
0.6527, 0.6668, 0.5907, 0.1872, 0.0458, 0.2137, 0.1709, 0.6042, 0.6840])
tensor([[[0.3125, 0.0432, 0.8238, 0.7128],
[0.3013, 0.3151, 0.4871, 0.2836]],
[[0.7665, 0.5883, 0.2020, 0.0844],
[0.3058, 0.1009, 0.4051, 0.3965]],
[[0.2865, 0.5662, 0.2777, 0.7841],
[0.7903, 0.3681, 0.2918, 0.0352]]])
tensor([[[0.3125, 0.3013],
[0.7665, 0.3058],
[0.2865, 0.7903]],
[[0.0432, 0.3151],
[0.5883, 0.1009],
[0.5662, 0.3681]],
[[0.8238, 0.4871],
[0.2020, 0.4051],
[0.2777, 0.2918]],
[[0.7128, 0.2836],
[0.0844, 0.3965],
[0.7841, 0.0352]]])
tensor([[[[0.3125, 0.0432, 0.8238, 0.7128]],
[[0.3013, 0.3151, 0.4871, 0.2836]]],
[[[0.7665, 0.5883, 0.2020, 0.0844]],
[[0.3058, 0.1009, 0.4051, 0.3965]]],
[[[0.2865, 0.5662, 0.2777, 0.7841]],
[[0.7903, 0.3681, 0.2918, 0.0352]]]])
tensor([[[[0.6473],
[0.0379],
[0.8159]],
[[0.8852],
[0.4247],
[0.0985]],
[[0.2929],
[0.6726],
[0.1386]],
[[0.3796],
[0.3385],
[0.2939]],
[[0.8167],
[0.1573],
[0.6731]]]])
tensor([[0.6473, 0.0379, 0.8159],
[0.8852, 0.4247, 0.0985],
[0.2929, 0.6726, 0.1386],
[0.3796, 0.3385, 0.2939],
[0.8167, 0.1573, 0.6731]])
similiar shape tensors
tensor([[8.1619e-33, 0.0000e+00, 3.0098e-32],
[0.0000e+00, 8.9683e-44, 0.0000e+00]])
Mathematical Operations
#elementwise operations
stats
matrix operation
tensor([[1, 4],
[2, 5],
[3, 6]])
Comparison
tensor([[False, True],
[ True, False]])
important functions
tensor([[0.7311, 0.8808],
[0.9526, 0.9820]])
tensor([[0.2689, 0.7311],
[0.2689, 0.7311]])
Inplace operation
clone
Tensor operations on GPU
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15095MB, multi_processor_count=40, uuid=ee6f7a89-4693-5b16-c869-5fe163a44485, L2_cache_size=4MB)
# Check if CUDA is available
if torch.cuda.is_available():
gpu_count = torch.cuda.device_count()
print(f"Number of GPUs available: {gpu_count}\n")
# Iterate through all GPUs
for i in range(gpu_count):
props = torch.cuda.get_device_properties(i)
print(f"GPU {i}: {props.name}")
print(f" - Total Memory: {props.total_memory / 1e9:.2f} GB")
print(f" - Compute Capability: {props.major}.{props.minor}")
print(f" - Multi-Processor Count: {props.multi_processor_count}\n")
else:
print("No GPUs available.")Number of GPUs available: 1
GPU 0: Tesla T4
- Total Memory: 15.83 GB
- Compute Capability: 7.5
- Multi-Processor Count: 40
From numpy to torch
From pandas to torch
AutoGrad
\[ q_1(z) = z^2\\ q_2(z) = z^3\\ q_3(z) = e^z\\ p(z) = \dfrac{q_1}{q_2} + q_3\\ = \dfrac{z^2}{z^3} + e^z = \dfrac{1}{z} + e^z\\ \dfrac{\partial p(z)}{\partial z} = \dfrac{-1}{z^2} + e^z \]
derivation along multiple path
\(\dfrac{\partial p(z)}{\partial z} = \dfrac{\partial p}{\partial q1}\dfrac{\partial q1}{\partial z}+ \dfrac{\partial p}{\partial q2}\dfrac{\partial q2}{\partial z}+\dfrac{\partial p}{\partial q3}\dfrac{\partial q3}{\partial z}\)
= \((\dfrac{1}{q_2}) * 2z + (\dfrac{-q_1}{q_2^2}) * 3z^2 + (1) * e^z\)
= \(\dfrac{2}{z^2} - \dfrac{3}{z^2} + e^z\)
= \(\dfrac{-1}{z^2} + e^z\)
Gradient Accumulation and zero_()
disable gradient tracking
- while updating the gradients
- predicting after the model is trained
Up Next
Training Pipeline
- Define Model
- for epoch in range(epochs):
- Forward pass
- Loss calculation
- Backward pass
- Parameters update
Model Evaluation
Improve Training Pipeline using nn.Module and torch.optim
- nn.Linear
- Activation Functions(nn.ReLU, nn.Sigmoid, nn.Softmax)
- nn.Sequential Container
- Loss Functions (nn.BCELoss, nn.CrossEntropyLoss etc.)
torch.optim (SGD,ADAM etc)
Improve training pipeline using torch.utils.data Dataset and DataLoader
- Data Loading
- batching
- shuffling sampling
- Parallelization (num_workers)