Preliminaries¶

Problems¶

We will discuss two types of problems:

Regression
Multiclass classification

We won't be discussing binary classification explicitly. Most of the discussion about neural networks apply to both classes of problems. There are a handful of places where the network has to be specified differently for regression and classification. Watch out for these instances.

Notation¶

Scalars will be represented using normal font, lower-case letters. Vectors will be represented using bold font, lower-case letters. Matrices will be represented using bold font, upper-case letters. When indexing elements of a vector or a matrix, normal font will be used, but the case will be inherited from the object that is being indexed.

\(a\): scalar
\(\boldsymbol{a}\): vector
\(\boldsymbol{A}\): matrix

\(a_i\): \(i^{\text{th}}\) component of the vector and \(A_{ij}\): \(j^{th}\) element in the \(i^{\text{th}}\) row of the matrix. Indices are used minimally as most of the equations are vectorized. This is a convention that we will largely stick to. But we might have to override them in a few occasions. In such situations, the nature of the object should be inferred from the context.

Data¶

The data-matrix is \(\boldsymbol{X}\):

size: \(n \times m\)
\(n\) data-points
\(m\) features

The data-matrix is common to both problems. Labels are represented differently in the case of regression and classification:

Regression¶

The predicted labels for regression is \(\boldsymbol{y}\), a vector of real numbers:

size: \(n\)
\(n\) data-points
single target corresponding to each point

Multiclass Classification¶

The one-hot matrix of labels for a multiclass classification problem is \(\boldsymbol{Y}\):

size: \(n \times k\)
\(n\) data-points
\(k\) classes

Each row of the matrix is a one-hot vector.