Logistic Regression

Logistic regression predicts class probabilities for binary classification problems. Despite the name, it is a classification algorithm, not regression. It serves as a standard baseline for binary classification.

Model Definition

The probability of the positive class is computed as:

P(y=1|x) = sigmoid(wx + b) = 1 / (1 + e^(-(wx + b)))

The sigmoid function maps any real value to the interval (0, 1).

Decision Boundary

Prediction: class 1 if P(y=1|x) > threshold (default: 0.5)

When w*x + b > 0, the model predicts Class 1.

The decision boundary is a hyperplane in feature space.

Training

Loss Function: Binary Cross-Entropy

Loss = -(1/n) * Sum[y_i * log(y_hat_i) + (1 - y_i) * log(1 - y_hat_i)]

Gradient Descent

Training logistic regression with gradient descent:

Initialize weights w to zero and bias b to zero
For each iteration:
- Compute linear combination: z = X * w + b
- Apply sigmoid to get predictions: y_hat = 1 / (1 + e^(-z))
- Calculate weight gradient: dw = X^T * (y_hat - y) / n
- Calculate bias gradient: db = mean(y_hat - y)
- Update weights: w = w - learning_rate * dw
- Update bias: b = b - learning_rate * db
Return learned weights and bias

Regularization

L1 and L2 penalties apply as in linear regression:

Loss = BCE + lambda * ||w||

Where ||w|| is either the L1 norm (sum of absolute values) or L2 norm (sum of squared values).

Multi-class Classification

One-vs-Rest (OvR)

Train K binary classifiers, one per class. Predict class with highest probability.

Softmax (Multinomial)

P(y=k|x) = e^(w_k * x) / Sum_j(e^(w_j * x))

This normalizes the outputs across all classes so probabilities sum to 1.

Evaluation Metrics

Metric	Use Case
Accuracy	Balanced classes
Precision	High false positive cost
Recall	High false negative cost
F1 Score	Balance precision and recall
AUC-ROC	Overall discrimination quality

Reference

Topic	Description
Sigmoid function purpose	Maps real numbers to (0, 1) for probability interpretation. Has convenient mathematical properties for optimization.
Difference from linear regression	Linear regression predicts continuous values; logistic regression predicts bounded probabilities (0-1). Uses log loss instead of MSE.
Imbalanced classes	Use class weights, threshold tuning, or resampling. Evaluate with precision-recall metrics, not accuracy.
Coefficient interpretation	A one-unit increase in feature X changes log-odds by the coefficient value. Exponentiate for odds ratio.
Failure cases	Complex non-linear decision boundaries, high-dimensional sparse data without regularization, unengineered feature interactions.

Model Definition​

Decision Boundary​

Training​

Loss Function: Binary Cross-Entropy​

Gradient Descent​

Regularization​

Multi-class Classification​

One-vs-Rest (OvR)​

Softmax (Multinomial)​

Evaluation Metrics​

Reference​

Table of Contents