Skip to main content

Feature Engineering

Feature engineering transforms raw data into representations suitable for machine learning models. Feature quality often determines model performance more than algorithm selection.

Numerical Features

Scaling

Standardization (z-score): X_scaled = (X - mean) / std

Centers data to mean 0 and standard deviation 1.

Min-Max scaling: X_scaled = (X - min) / (max - min)

Scales data to the range [0, 1].

Transformations

TransformUse Case
Log transformSkewed distributions
Power transformBox-Cox, Yeo-Johnson normalization
BinningConvert to categorical

Categorical Features

Encoding Methods

MethodUse CaseCharacteristics
One-hotLow cardinalityNo ordering assumed
Label encodingOrdinal categoriesCompact representation
Target encodingHigh cardinalityUses target information
EmbeddingVery high cardinalityLearned representations

One-hot encoding: Convert each category into a binary column. A category with k values becomes k binary features.

Target encoding: Replace each category with the mean target value for that category. Requires care to prevent data leakage (compute means on training data only).

Feature Creation

Aggregations

Group data by entity (e.g., user_id) and compute aggregate statistics:

AggregationExample
MeanAverage purchase amount per user
SumTotal purchase amount per user
CountNumber of purchases per user
MaxMost recent purchase date

Interactions

Create new features by combining existing features:

InteractionFormulaInterpretation
Ratioprice / sqftPrice per square foot
Productage * incomeCaptures joint effect of age and income

Time Features

Extract components from datetime fields:

FeatureExtractionUse Case
hourHour of day (0-23)Time-of-day patterns
day_of_weekDay (0=Monday to 6=Sunday)Weekly patterns
is_weekendTrue if Saturday or SundayWeekend vs weekday behavior

Feature Selection

Filter Methods

MethodDescription
CorrelationRelationship with target
Mutual informationNon-linear dependency measure
Chi-squared testCategorical feature significance

Wrapper Methods

MethodDescription
Forward selectionAdd features incrementally
Backward eliminationRemove features incrementally
Recursive feature eliminationIteratively remove least important

Embedded Methods

MethodDescription
Lasso (L1) regularizationZeros out unimportant weights
Tree-based importanceBuilt-in feature ranking

Handling Missing Values

StrategyUse Case
Drop rowsFew missing values, random missingness
Mean/medianNumerical features, MCAR
ModeCategorical features
Model-basedComplex missingness patterns
IndicatorMissingness is informative

Reference

TopicGuidance
High-cardinality categoricalsTarget encoding, frequency encoding, or embeddings. One-hot encoding creates excessive dimensionality. Target encoding requires leakage prevention.
Target encoding risksLeakage if computed on training data used for model fitting. Use cross-validation or holdout set for encoding computation.
Feature selection approachStart with domain knowledge. Use correlation with target, tree-based importance, or regularization to identify weak features. Remove and validate.
Missing value handlingDepends on missingness mechanism. Random: impute. Non-random: create indicator. Excessive missingness: drop feature.
Domain-specific featuresConsider signals predictive of the outcome. E-commerce example: purchase recency, price sensitivity, category preferences, time-of-day patterns.