Feature Engineering

Feature engineering transforms raw data into representations suitable for machine learning models. Feature quality often determines model performance more than algorithm selection.

Numerical Features

Scaling

Standardization (z-score): X_scaled = (X - mean) / std

Centers data to mean 0 and standard deviation 1.

Min-Max scaling: X_scaled = (X - min) / (max - min)

Scales data to the range [0, 1].

Transformations

Transform	Use Case
Log transform	Skewed distributions
Power transform	Box-Cox, Yeo-Johnson normalization
Binning	Convert to categorical

Categorical Features

Encoding Methods

Method	Use Case	Characteristics
One-hot	Low cardinality	No ordering assumed
Label encoding	Ordinal categories	Compact representation
Target encoding	High cardinality	Uses target information
Embedding	Very high cardinality	Learned representations

One-hot encoding: Convert each category into a binary column. A category with k values becomes k binary features.

Target encoding: Replace each category with the mean target value for that category. Requires care to prevent data leakage (compute means on training data only).

Feature Creation

Aggregations

Group data by entity (e.g., user_id) and compute aggregate statistics:

Aggregation	Example
Mean	Average purchase amount per user
Sum	Total purchase amount per user
Count	Number of purchases per user
Max	Most recent purchase date

Interactions

Create new features by combining existing features:

Interaction	Formula	Interpretation
Ratio	price / sqft	Price per square foot
Product	age * income	Captures joint effect of age and income

Time Features

Extract components from datetime fields:

Feature	Extraction	Use Case
hour	Hour of day (0-23)	Time-of-day patterns
day_of_week	Day (0=Monday to 6=Sunday)	Weekly patterns
is_weekend	True if Saturday or Sunday	Weekend vs weekday behavior

Feature Selection

Filter Methods

Method	Description
Correlation	Relationship with target
Mutual information	Non-linear dependency measure
Chi-squared test	Categorical feature significance

Wrapper Methods

Method	Description
Forward selection	Add features incrementally
Backward elimination	Remove features incrementally
Recursive feature elimination	Iteratively remove least important

Embedded Methods

Method	Description
Lasso (L1) regularization	Zeros out unimportant weights
Tree-based importance	Built-in feature ranking

Handling Missing Values

Strategy	Use Case
Drop rows	Few missing values, random missingness
Mean/median	Numerical features, MCAR
Mode	Categorical features
Model-based	Complex missingness patterns
Indicator	Missingness is informative

Reference

Topic	Guidance
High-cardinality categoricals	Target encoding, frequency encoding, or embeddings. One-hot encoding creates excessive dimensionality. Target encoding requires leakage prevention.
Target encoding risks	Leakage if computed on training data used for model fitting. Use cross-validation or holdout set for encoding computation.
Feature selection approach	Start with domain knowledge. Use correlation with target, tree-based importance, or regularization to identify weak features. Remove and validate.
Missing value handling	Depends on missingness mechanism. Random: impute. Non-random: create indicator. Excessive missingness: drop feature.
Domain-specific features	Consider signals predictive of the outcome. E-commerce example: purchase recency, price sensitivity, category preferences, time-of-day patterns.

Numerical Features​

Scaling​

Transformations​

Categorical Features​

Encoding Methods​

Feature Creation​

Aggregations​

Interactions​

Time Features​

Feature Selection​

Filter Methods​

Wrapper Methods​

Embedded Methods​

Handling Missing Values​

Reference​

Table of Contents