import numpy as np
69)
np.random.seed(
= np.random.normal(scale=1, size=(1000, 2))
X = np.array([-1, 1])
beta = np.array([-np.inf, -2, -1, 2, np.inf]) cutpoints
Ordinal Classification
Colab Link: Click here!
Ordinal Measurement
Let Y^* be the ordinal variable (with J classes) we want to predict. We denote Y^* in terms of the latent continuous variable Y and cutpoints \alpha_0, \alpha_1, ..., \alpha_J as follows \begin{align*} Y^* &= j \; \text{if} \; \alpha_{j-1} \le Y< \alpha_j \\ &(j = 1, ..., J) \end{align*}
Regression Assumption
We model latent variable Y as follows:
Y = \beta X + ϵ
where \beta us our slope parameter that needs to be estimated, and ϵ a randomly distributed error uncorrelated with X (0 mean) with variance 1.
Synthetic Data Creation
We will consider an ordinal classification problem with 4 classes. We randomly sample 1000 points from a standard normal distribution. We fix \beta = \begin{pmatrix} -1 \\ 1 \end{pmatrix}, and cutpoints \alpha_0 = -\infty, \alpha_1 = -2, \alpha_2 = -1, \alpha_3 = 2, \alpha_4 = \infty
from sklearn.model_selection import train_test_split
= X@beta
Y
def ordify(cutpoints):
def hlo(x):
for i in range(len(cutpoints)-1):
if cutpoints[i] <= x < cutpoints[i+1]:
return i+1
return hlo
= ordify(cutpoints)
ordinate = np.array([ordinate(i) for i in Y])
Y_ord
# X_train, X_test, y_train, y_test = train_test_split(X, Y_ord, test_size=0.33, random_state=42)
OvR approach
We will fit a LR model using the one-vs-rest approach, which just considers our target to be a Nominal variable, and trains 4 classifiers.
from sklearn.linear_model import LogisticRegression
= LogisticRegression(multi_class='ovr').fit(X, Y_ord)
OvR_LR OvR_LR.score(X, Y_ord)
0.797
print('Beta estimates and biases')
for i in range(len(OvR_LR.coef_)):
= -1 * OvR_LR.coef_[i][0]
x print(OvR_LR.coef_[i]/x, OvR_LR.intercept_[i]/x)
Beta estimates and biases
[-1. 1.02454387] 2.1044752205291597
[-1. 1.07017873] 2.088226413125316
[-1. 1.33242037] 1.5887705536936128
[-1. 0.97466832] -2.050381104615722
Can we do better?
We see that by converting to a Nominal approach we are losing out on some data - namely, the ordering present within the target variables themselves.
Is there any way to capture this ordering, while not impacting the underlying learning scheme (LR in this case)?
Ordinal Classification Scheme
We can construct K-1 new variables for an observed datapoint x_i (with corresponding class y_i) in the following manner (Ex with K=4):
Condition | N_1 | N_2 | N_3 |
---|---|---|---|
\text{if } y_i = 1 | 0 | 0 | 0 |
\text{if } y_i = 2 | 1 | 0 | 0 |
\text{if } y_i = 3 | 1 | 1 | 0 |
\text{if } y_i = 4 | 1 | 1 | 1 |
In other words, generate for i = 1, 2, \cdots, K-1 N_i = \mathbb{1}(y_1 > i)
We then construct K-1 binary classification problems, where the target of the ith problem is N_i. We can go ahead and train K-1 classifiers using our preferred learning scheme.
How to predict?
For a validation datapoint x_i, use the trained models to calculate estimates \hat{N_i} \text{ for } i = 1, 2, \cdots, K-1. We then use these to calculate probabilities of each class as follows:
Class | Probability |
---|---|
1 | 1 - \hat{N_1} |
2 | \hat{N_1} - \hat{N_2} |
i | \hat{N_{i-1}} - \hat{N_i} |
K-1 | \hat{N_{K-1}} |
The first and last class probabilites are from a single classifier, where as the others are the difference of the outputs from a pair of consecutive (w.r.t i) classifiers.
Note that \hat{N_i} = \text{Pr}(y_i > i).
from sklearn.base import clone, BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_is_fitted, check_array
from sklearn.utils.multiclass import check_classification_targets
import numpy as np
class OrdinalClassifier(BaseEstimator, ClassifierMixin):
def __init__(self,learner):
self.learner = learner
self.ordered_learners = dict()
self.classes = []
def fit(self,X,y):
self.classes = np.sort(np.unique(y))
assert self.classes.shape[0] >= 3, f'OrdinalClassifier needs at least 3 classes, only {self.classes.shape[0]} found'
for i in range(self.classes.shape[0]-1):
= np.vectorize(int)(y > self.classes[i])
N_i = clone(self.learner).fit(X,N_i)
learner self.ordered_learners[i] = learner
def predict(self,X):
return np.vectorize(lambda i: self.classes[i])(np.argmax(self.predict_proba(X), axis=1))
def predict_proba(self,X):
= [self.ordered_learners[k].predict_proba(X)[:,1].reshape(-1,1) for k in self.ordered_learners]
predicted
= 1-predicted[0]
N_1 = predicted[-1]
N_K = [predicted[i] - predicted[i+1] for i in range(len(predicted) - 1)]
N_i
= np.hstack([N_1, *N_i, N_K])
probs
return probs
from sklearn.linear_model import LogisticRegression
= OrdinalClassifier(LogisticRegression())
model
model.fit(X, Y_ord)
model.score(X, Y_ord)
0.975
One Hot Encoding Bias Term - Another approach
We construct a dataset similar to the above, but one-hot encode biases based on the true-class; this enables our estimate for \beta to be more closer to the actual (due to large sample size)
def encoder(X, Y):
= []
segments = []
segments_y = len(X), max(Y)
n, J
for i in range(1, J):
= np.hstack((np.zeros((n, J-1)), X))
segment -1] = -1
segment[:, i
segments.append(segment)
int)(Y>i))
segments_y.append(np.vectorize(
return (np.vstack(segments), np.hstack(segments_y))
= encoder(X, Y_ord)
X_e, y_e = LogisticRegression(fit_intercept=False).fit(X_e, y_e)
OE_LR
print('Bias (first 3 terms) and beta (rest) estimates')
print(OE_LR.coef_[0]/(-1*OE_LR.coef_[0][-2]))
from sklearn.metrics import accuracy_score
= OE_LR.coef_[0]/(-1*OE_LR.coef_[0][-2])
a1, a2, a3, b1, b2
= np.array([b1, b2])
beta_pred = np.array([-np.inf, a1, a2, a3, np.inf])
cutpoints_pred
= ordify(cutpoints_pred)
ordinate = np.array([ordinate(i) for i in X@beta_pred])
y_pred
print(accuracy_score(Y_ord, y_pred))
Bias (first 3 terms) and beta (rest) estimates
[-2.04249489 -1.03041969 2.02754915 -1. 1.01914656]
0.99