Analytics and A/B Testing for PMs
Analytics interviews assess data analysis skills, experiment design capabilities, and the ability to interpret results. Data informs decisions but does not replace decision-making.
SQL Fundamentals
Basic SQL proficiency eliminates dependency on data teams for simple queries.
Essential Queries
Counting unique users: Use COUNT with DISTINCT on the user_id column to count unique users who performed a specific action (such as a purchase) within a date range. Filter with WHERE clauses on event name and date.
Segmenting by dimension: Group users by a dimension (such as country) using GROUP BY, count the results, and sort with ORDER BY to identify the largest segments.
These patterns cover approximately 80% of PM data exploration needs.
Cohort Analysis
Aggregate metrics can obscure trends. Cohort analysis reveals changes over time.
Example: Overall 30-day retention is 35%.
| Cohort | Retention |
|---|---|
| January | 45% |
| February | 40% |
| March | 25% |
The aggregate hides a deteriorating trend. Cohort analysis identifies the pattern.
Cohort Table Format
| Cohort | Week 1 | Week 4 | Week 8 |
|---|---|---|---|
| January | 100% | 45% | 30% |
| February | 100% | 40% | 28% |
| March | 100% | 25% | 15% |
This format reveals retention degradation across cohorts.
Funnel Analysis
Funnel analysis identifies where users abandon a process.
Overall conversion is 25%. The largest drop-off occurs at "start checkout" (40%). This is the primary investigation target.
A/B Testing
Fundamentals
A/B testing compares two versions with randomized user assignment:
- Control group: Existing experience
- Variant group: New experience
When to Test vs. Ship Directly
| Test | Ship Directly |
|---|---|
| Genuine uncertainty about impact | Obvious improvement (bug fixes) |
| Easily reversible change | Insufficient traffic for significance |
| Sufficient traffic for significance | Urgent (security fixes) |
| High-stakes decision | Minimal downside, easy rollback |
Statistical Significance
The 95% confidence threshold (p < 0.05) means: if no actual difference exists, results this extreme would occur only 5% of the time.
This does not mean "95% probability the variant is better."
Common Errors
| Error | Description |
|---|---|
| Peeking | Checking results before planned end date increases false positive rate |
| Practical vs. statistical significance | A 1% lift may be statistically significant but not business-meaningful |
| Winner's curse | Experiments that reach significance often overestimate true effect |
Experiment Design Process
- Write hypothesis - "Changing X will cause Y because Z"
- Define metrics
- Primary: Target metric to move
- Guardrail: Metric that should not degrade
- Secondary: Contextual information
- Calculate sample size - Determine before starting
- Run full duration - Do not stop early based on interim results
- Check for Sample Ratio Mismatch - Verify expected allocation (50/50)
Experiment Design Example
Testing: New checkout flow
| Metric Type | Metric | Purpose |
|---|---|---|
| Primary | Checkout completion rate | Target improvement |
| Guardrail | Return rate | Ensure quality decisions |
| Secondary | Time to purchase, payment method distribution | Context |
Sample size: Given 40% baseline activation, targeting 15% relative lift, 80% power requires ~3,000 users per variant.
Interview Question Types
"Design an A/B test for X"
Structure:
- Hypothesis with causal reasoning
- Metrics (primary, guardrail, secondary)
- Audience and allocation
- Sample size calculation
- Duration determination
- Analysis plan
"The test shows 3% lift. Should we ship?"
Evaluation criteria:
- Statistical significance and confidence interval width
- Practical significance (business impact of 3%)
- Segment analysis (consistent across user types?)
- Guardrail metric status
"A metric dropped 10%. What do you do?"
Diagnostic process:
- Verify data - Check for tracking changes, pipeline issues
- Determine timing - Sudden vs. gradual decline
- Segment - Platform, country, user tenure, acquisition channel
- Identify changes - Releases, campaigns, competitor activity, seasonality
- Form and test hypothesis
Limitations of Data
Data indicates what happened, not why or what action to take.
Decisions That May Not Require Data
- Ethical decisions
- Large strategic bets with sparse data
- Design quality judgments
Common Analytical Errors
| Error | Description |
|---|---|
| Correlation assumed as causation | Users who complete onboarding retain better; onboarding may not cause retention |
| Optimization to local maxima | Individual test wins may not improve overall product coherence |
| Ignoring qualitative signals | Observing user behavior reveals context that metrics cannot |
Company-Specific Approaches
Booking.com
High-volume experimentation (thousands of tests annually). Optimization for conversion. Case study in both experimentation benefits and potential manipulation concerns.
Netflix
Personalizes thumbnails through A/B testing. Tests visual presentation, not just features. The same content displays differently based on user preferences.
Amazon
"Two-way door" vs. "one-way door" decision framework. Reversible decisions require less analysis than irreversible ones.
Airbnb
Discovered long-term effects that short-term tests missed. Algorithm changes that improved short-term bookings harmed long-term host satisfaction. Now considers ecosystem effects beyond test windows.
Microsoft (Bing)
Identified measurement problems: ranking changes that reduced measured engagement actually improved user success (users found answers faster, clicked less). Metrics were measuring the wrong outcome.
Analytical Mindset Characteristics
| Characteristic | Description |
|---|---|
| Comfort with uncertainty | Ability to acknowledge insufficient data without paralysis |
| Self-skepticism | Actively seeking holes in own analysis |
| Pragmatism | Knowing when 80% confidence suffices |
| Communication | Connecting data to decisions and context |
| Ethical awareness | Considering consequences of optimization targets |
Analytics questions test judgment about data usage, not just statistical knowledge.