Skip to main content

Analytics and A/B Testing for PMs

Analytics interviews assess data analysis skills, experiment design capabilities, and the ability to interpret results. Data informs decisions but does not replace decision-making.

SQL Fundamentals

Basic SQL proficiency eliminates dependency on data teams for simple queries.

Essential Queries

Counting unique users: Use COUNT with DISTINCT on the user_id column to count unique users who performed a specific action (such as a purchase) within a date range. Filter with WHERE clauses on event name and date.

Segmenting by dimension: Group users by a dimension (such as country) using GROUP BY, count the results, and sort with ORDER BY to identify the largest segments.

These patterns cover approximately 80% of PM data exploration needs.

Cohort Analysis

Aggregate metrics can obscure trends. Cohort analysis reveals changes over time.

Example: Overall 30-day retention is 35%.

CohortRetention
January45%
February40%
March25%

The aggregate hides a deteriorating trend. Cohort analysis identifies the pattern.

Cohort Table Format

CohortWeek 1Week 4Week 8
January100%45%30%
February100%40%28%
March100%25%15%

This format reveals retention degradation across cohorts.

Funnel Analysis

Funnel analysis identifies where users abandon a process.

Loading diagram...

Overall conversion is 25%. The largest drop-off occurs at "start checkout" (40%). This is the primary investigation target.

A/B Testing

Fundamentals

A/B testing compares two versions with randomized user assignment:

  • Control group: Existing experience
  • Variant group: New experience

When to Test vs. Ship Directly

TestShip Directly
Genuine uncertainty about impactObvious improvement (bug fixes)
Easily reversible changeInsufficient traffic for significance
Sufficient traffic for significanceUrgent (security fixes)
High-stakes decisionMinimal downside, easy rollback

Statistical Significance

The 95% confidence threshold (p < 0.05) means: if no actual difference exists, results this extreme would occur only 5% of the time.

This does not mean "95% probability the variant is better."

Common Errors

ErrorDescription
PeekingChecking results before planned end date increases false positive rate
Practical vs. statistical significanceA 1% lift may be statistically significant but not business-meaningful
Winner's curseExperiments that reach significance often overestimate true effect

Experiment Design Process

  1. Write hypothesis - "Changing X will cause Y because Z"
  2. Define metrics
    • Primary: Target metric to move
    • Guardrail: Metric that should not degrade
    • Secondary: Contextual information
  3. Calculate sample size - Determine before starting
  4. Run full duration - Do not stop early based on interim results
  5. Check for Sample Ratio Mismatch - Verify expected allocation (50/50)

Experiment Design Example

Testing: New checkout flow

Metric TypeMetricPurpose
PrimaryCheckout completion rateTarget improvement
GuardrailReturn rateEnsure quality decisions
SecondaryTime to purchase, payment method distributionContext

Sample size: Given 40% baseline activation, targeting 15% relative lift, 80% power requires ~3,000 users per variant.

Interview Question Types

"Design an A/B test for X"

Structure:

  1. Hypothesis with causal reasoning
  2. Metrics (primary, guardrail, secondary)
  3. Audience and allocation
  4. Sample size calculation
  5. Duration determination
  6. Analysis plan

"The test shows 3% lift. Should we ship?"

Evaluation criteria:

  • Statistical significance and confidence interval width
  • Practical significance (business impact of 3%)
  • Segment analysis (consistent across user types?)
  • Guardrail metric status

"A metric dropped 10%. What do you do?"

Diagnostic process:

  1. Verify data - Check for tracking changes, pipeline issues
  2. Determine timing - Sudden vs. gradual decline
  3. Segment - Platform, country, user tenure, acquisition channel
  4. Identify changes - Releases, campaigns, competitor activity, seasonality
  5. Form and test hypothesis

Limitations of Data

Data indicates what happened, not why or what action to take.

Decisions That May Not Require Data

  • Ethical decisions
  • Large strategic bets with sparse data
  • Design quality judgments

Common Analytical Errors

ErrorDescription
Correlation assumed as causationUsers who complete onboarding retain better; onboarding may not cause retention
Optimization to local maximaIndividual test wins may not improve overall product coherence
Ignoring qualitative signalsObserving user behavior reveals context that metrics cannot

Company-Specific Approaches

Booking.com

High-volume experimentation (thousands of tests annually). Optimization for conversion. Case study in both experimentation benefits and potential manipulation concerns.

Netflix

Personalizes thumbnails through A/B testing. Tests visual presentation, not just features. The same content displays differently based on user preferences.

Amazon

"Two-way door" vs. "one-way door" decision framework. Reversible decisions require less analysis than irreversible ones.

Airbnb

Discovered long-term effects that short-term tests missed. Algorithm changes that improved short-term bookings harmed long-term host satisfaction. Now considers ecosystem effects beyond test windows.

Microsoft (Bing)

Identified measurement problems: ranking changes that reduced measured engagement actually improved user success (users found answers faster, clicked less). Metrics were measuring the wrong outcome.

Analytical Mindset Characteristics

CharacteristicDescription
Comfort with uncertaintyAbility to acknowledge insufficient data without paralysis
Self-skepticismActively seeking holes in own analysis
PragmatismKnowing when 80% confidence suffices
CommunicationConnecting data to decisions and context
Ethical awarenessConsidering consequences of optimization targets

Analytics questions test judgment about data usage, not just statistical knowledge.