Bias-Variance Tradeoff
Prediction error decomposes into three components: bias, variance, and irreducible noise. Understanding this decomposition guides model selection and tuning decisions.
Error Decomposition
Total Error = Bias^2 + Variance + Irreducible Noise
| Component | Definition |
|---|---|
| Bias | Error from incorrect model assumptions. Consistent errors across different training sets. |
| Variance | Error from sensitivity to training data. Model changes significantly with different samples. |
| Noise | Inherent randomness in the data. Cannot be reduced by any model. |
High Bias (Underfitting)
High bias indicates the model is too simple to capture the underlying pattern.
| Indicator | Description |
|---|---|
| Training error | High |
| Test error | High |
| Train-test gap | Small |
Remediation:
- Add features or feature interactions
- Use a more complex model
- Reduce regularization strength
High Variance (Overfitting)
High variance indicates the model memorizes training data including noise.
| Indicator | Description |
|---|---|
| Training error | Low |
| Test error | High |
| Train-test gap | Large |
Remediation:
- Increase training data
- Remove noisy features
- Add regularization (L1, L2, dropout)
- Use a simpler model
- Apply early stopping
Trade-off Relationship
| Model Complexity | Bias | Variance |
|---|---|---|
| Low (simple models) | High | Low |
| High (complex models) | Low | High |
| Optimal | Balanced | Balanced |
The optimal model complexity minimizes total error.
Diagnosis with Learning Curves
Generate learning curves by training the model on increasing amounts of data (e.g., 10%, 20%, ..., 100% of training data) and measuring both training and validation scores at each point using cross-validation.
| Curve Pattern | Diagnosis | Data Impact |
|---|---|---|
| Both curves plateau at high error | High bias | More data will not help |
| Large gap between curves | High variance | More data may help |
Reference
| Topic | Description |
|---|---|
| Bias-variance tradeoff | Simple models have consistent errors (high bias). Complex models vary with training data (high variance). Optimal model minimizes total error. |
| Overfitting detection | Training error significantly lower than test error indicates overfitting. |
| Complexity relationship | Increased complexity decreases bias and increases variance. |
| Regularization effect | Regularization increases bias slightly and reduces variance. Net effect is typically positive. |
| Learning curve interpretation | Both curves high and flat: high bias. Large gap between curves: high variance. |