Skip to content

Preliminaries

Problems

We will discuss two types of problems:

  • Regression
  • Multiclass classification

We won't be discussing binary classification explicitly. Most of the discussion about neural networks apply to both classes of problems. There are a handful of places where the network has to be specified differently for regression and classification. Watch out for these instances.

Notation

Scalars will be represented using normal font, lower-case letters. Vectors will be represented using bold font, lower-case letters. Matrices will be represented using bold font, upper-case letters. When indexing elements of a vector or a matrix, normal font will be used, but the case will be inherited from the object that is being indexed.

  • \(a\): scalar

  • \(\boldsymbol{a}\): vector

  • \(\boldsymbol{A}\): matrix

\(a_i\): \(i^{\text{th}}\) component of the vector and \(A_{ij}\): \(j^{th}\) element in the \(i^{\text{th}}\) row of the matrix. Indices are used minimally as most of the equations are vectorized. This is a convention that we will largely stick to. But we might have to override them in a few occasions. In such situations, the nature of the object should be inferred from the context.

Data

The data-matrix is \(\boldsymbol{X}\):

  • size: \(n \times m\)
  • \(n\) data-points
  • \(m\) features

The data-matrix is common to both problems. Labels are represented differently in the case of regression and classification:

Regression

The predicted labels for regression is \(\boldsymbol{y}\), a vector of real numbers:

  • size: \(n\)
  • \(n\) data-points
  • single target corresponding to each point

Multiclass Classification

The one-hot matrix of labels for a multiclass classification problem is \(\boldsymbol{Y}\):

  • size: \(n \times k\)
  • \(n\) data-points
  • \(k\) classes

Each row of the matrix is a one-hot vector.