Skip to main content

Logistic Regression

Logistic regression predicts class probabilities for binary classification problems. Despite the name, it is a classification algorithm, not regression. It serves as a standard baseline for binary classification.

Model Definition

The probability of the positive class is computed as:

P(y=1|x) = sigmoid(wx + b) = 1 / (1 + e^(-(wx + b)))

The sigmoid function maps any real value to the interval (0, 1).

Decision Boundary

Prediction: class 1 if P(y=1|x) > threshold (default: 0.5)

When w*x + b > 0, the model predicts Class 1.

The decision boundary is a hyperplane in feature space.

Training

Loss Function: Binary Cross-Entropy

Loss = -(1/n) * Sum[y_i * log(y_hat_i) + (1 - y_i) * log(1 - y_hat_i)]

Gradient Descent

Training logistic regression with gradient descent:

  1. Initialize weights w to zero and bias b to zero
  2. For each iteration:
    • Compute linear combination: z = X * w + b
    • Apply sigmoid to get predictions: y_hat = 1 / (1 + e^(-z))
    • Calculate weight gradient: dw = X^T * (y_hat - y) / n
    • Calculate bias gradient: db = mean(y_hat - y)
    • Update weights: w = w - learning_rate * dw
    • Update bias: b = b - learning_rate * db
  3. Return learned weights and bias

Regularization

L1 and L2 penalties apply as in linear regression:

Loss = BCE + lambda * ||w||

Where ||w|| is either the L1 norm (sum of absolute values) or L2 norm (sum of squared values).

Multi-class Classification

One-vs-Rest (OvR)

Train K binary classifiers, one per class. Predict class with highest probability.

Softmax (Multinomial)

P(y=k|x) = e^(w_k * x) / Sum_j(e^(w_j * x))

This normalizes the outputs across all classes so probabilities sum to 1.

Evaluation Metrics

MetricUse Case
AccuracyBalanced classes
PrecisionHigh false positive cost
RecallHigh false negative cost
F1 ScoreBalance precision and recall
AUC-ROCOverall discrimination quality

Reference

TopicDescription
Sigmoid function purposeMaps real numbers to (0, 1) for probability interpretation. Has convenient mathematical properties for optimization.
Difference from linear regressionLinear regression predicts continuous values; logistic regression predicts bounded probabilities (0-1). Uses log loss instead of MSE.
Imbalanced classesUse class weights, threshold tuning, or resampling. Evaluate with precision-recall metrics, not accuracy.
Coefficient interpretationA one-unit increase in feature X changes log-odds by the coefficient value. Exponentiate for odds ratio.
Failure casesComplex non-linear decision boundaries, high-dimensional sparse data without regularization, unengineered feature interactions.