Decision Trees
Decision trees partition data by recursive feature-based splitting. Ensemble methods (Random Forest, Gradient Boosting) combine multiple trees for improved performance.
Tree Structure
Loading diagram...
Splitting Criteria
Classification
Gini Impurity:
Gini = 1 - Sum(p_i^2) for all classes
Entropy / Information Gain:
Entropy = -Sum(p_i * log2(p_i)) for all classes
Regression
Variance Reduction:
Variance = (1/n) * Sum((y_i - y_mean)^2)
Overfitting Prevention
| Technique | Description |
|---|---|
| Max depth | Limit tree depth |
| Min samples split | Minimum samples required to split |
| Min samples leaf | Minimum samples required in leaf |
| Pruning | Remove branches post-training |
Ensemble Methods
Random Forest
Bagging with feature randomization:
| Component | Description |
|---|---|
| Bootstrap sampling | Train each tree on random subset with replacement |
| Feature randomization | Each split considers random subset of features |
| Aggregation | Average predictions (regression) or majority vote (classification) |
Effect: Reduces variance by averaging trees that overfit differently.
Gradient Boosting (XGBoost, LightGBM)
Sequential training to correct errors:
| Step | Description |
|---|---|
| 1 | Fit initial model |
| 2 | Compute residuals |
| 3 | Fit new tree to residuals |
| 4 | Add to ensemble with learning rate |
| 5 | Repeat |
Effect: Reduces bias through iterative error correction.
Feature Importance
| Method | Description |
|---|---|
| Gini importance | Reduction in impurity across all splits using that feature |
| Permutation importance | Performance drop when feature values are shuffled |
Permutation importance is more reliable but computationally slower.
Reference
| Topic | Description |
|---|---|
| Splitting decision | Evaluates all features and split points, selects maximum information gain (or minimum impurity). Greedy algorithm, not globally optimal. |
| Bagging vs boosting | Bagging: parallel training on bootstrap samples, average predictions, reduces variance. Boosting: sequential training to correct errors, reduces bias. |
| Random Forest overfitting reduction | Each tree overfits to different data and feature subsets. Averaging cancels noise. Trade-off: reduced interpretability. |
| XGBoost vs Random Forest | XGBoost typically achieves higher accuracy with proper tuning. Random Forest is simpler to configure. Start with Random Forest, use XGBoost for additional performance. |
| Feature importance interpretation | Gini importance: impurity reduction across splits. Permutation importance: accuracy drop when feature is shuffled. Permutation is more reliable. |