Bias-Variance Tradeoff

Prediction error decomposes into three components: bias, variance, and irreducible noise. Understanding this decomposition guides model selection and tuning decisions.

Error Decomposition

Total Error = Bias^2 + Variance + Irreducible Noise

Component	Definition
Bias	Error from incorrect model assumptions. Consistent errors across different training sets.
Variance	Error from sensitivity to training data. Model changes significantly with different samples.
Noise	Inherent randomness in the data. Cannot be reduced by any model.

High Bias (Underfitting)

High bias indicates the model is too simple to capture the underlying pattern.

Indicator	Description
Training error	High
Test error	High
Train-test gap	Small

Remediation:

Add features or feature interactions
Use a more complex model
Reduce regularization strength

High Variance (Overfitting)

High variance indicates the model memorizes training data including noise.

Indicator	Description
Training error	Low
Test error	High
Train-test gap	Large

Remediation:

Increase training data
Remove noisy features
Add regularization (L1, L2, dropout)
Use a simpler model
Apply early stopping

Trade-off Relationship

Model Complexity	Bias	Variance
Low (simple models)	High	Low
High (complex models)	Low	High
Optimal	Balanced	Balanced

The optimal model complexity minimizes total error.

Diagnosis with Learning Curves

Generate learning curves by training the model on increasing amounts of data (e.g., 10%, 20%, ..., 100% of training data) and measuring both training and validation scores at each point using cross-validation.

Curve Pattern	Diagnosis	Data Impact
Both curves plateau at high error	High bias	More data will not help
Large gap between curves	High variance	More data may help

Reference

Topic	Description
Bias-variance tradeoff	Simple models have consistent errors (high bias). Complex models vary with training data (high variance). Optimal model minimizes total error.
Overfitting detection	Training error significantly lower than test error indicates overfitting.
Complexity relationship	Increased complexity decreases bias and increases variance.
Regularization effect	Regularization increases bias slightly and reduces variance. Net effect is typically positive.
Learning curve interpretation	Both curves high and flat: high bias. Large gap between curves: high variance.

Error Decomposition​

High Bias (Underfitting)​

High Variance (Overfitting)​

Trade-off Relationship​

Diagnosis with Learning Curves​

Reference​

Table of Contents