Skip to main content

Bias-Variance Tradeoff

Prediction error decomposes into three components: bias, variance, and irreducible noise. Understanding this decomposition guides model selection and tuning decisions.

Error Decomposition

Total Error = Bias^2 + Variance + Irreducible Noise

ComponentDefinition
BiasError from incorrect model assumptions. Consistent errors across different training sets.
VarianceError from sensitivity to training data. Model changes significantly with different samples.
NoiseInherent randomness in the data. Cannot be reduced by any model.

High Bias (Underfitting)

High bias indicates the model is too simple to capture the underlying pattern.

IndicatorDescription
Training errorHigh
Test errorHigh
Train-test gapSmall

Remediation:

  • Add features or feature interactions
  • Use a more complex model
  • Reduce regularization strength

High Variance (Overfitting)

High variance indicates the model memorizes training data including noise.

IndicatorDescription
Training errorLow
Test errorHigh
Train-test gapLarge

Remediation:

  • Increase training data
  • Remove noisy features
  • Add regularization (L1, L2, dropout)
  • Use a simpler model
  • Apply early stopping

Trade-off Relationship

Model ComplexityBiasVariance
Low (simple models)HighLow
High (complex models)LowHigh
OptimalBalancedBalanced

The optimal model complexity minimizes total error.

Diagnosis with Learning Curves

Generate learning curves by training the model on increasing amounts of data (e.g., 10%, 20%, ..., 100% of training data) and measuring both training and validation scores at each point using cross-validation.

Curve PatternDiagnosisData Impact
Both curves plateau at high errorHigh biasMore data will not help
Large gap between curvesHigh varianceMore data may help

Reference

TopicDescription
Bias-variance tradeoffSimple models have consistent errors (high bias). Complex models vary with training data (high variance). Optimal model minimizes total error.
Overfitting detectionTraining error significantly lower than test error indicates overfitting.
Complexity relationshipIncreased complexity decreases bias and increases variance.
Regularization effectRegularization increases bias slightly and reduces variance. Net effect is typically positive.
Learning curve interpretationBoth curves high and flat: high bias. Large gap between curves: high variance.