Skip to main content

Probability for Data Science Interviews

Probability forms the mathematical foundation for reasoning about uncertainty in data science. This section covers the core concepts tested in interviews.

Fundamental Definitions

TermDefinitionExample
Sample spaceSet of all possible outcomesDie roll: 6
EventSubset of outcomes"Rolling even": 6
ProbabilityMeasure from 0 (impossible) to 1 (certain)P(rolling even) = 3/6 = 0.5

Core Rules

Addition Rule

For calculating the probability of A or B occurring:

P(A or B) = P(A) + P(B) - P(A and B)

Multiplication Rule

For calculating the probability of A and B occurring (independent events):

P(A and B) = P(A) x P(B)

Conditional Probability

Conditional probability measures the probability of event A given that event B has occurred.

P(A|B) = P(A and B) / P(B)

Example: Drawing a card from a standard deck.

Given InformationCalculation
P(King and Face Card)4/52
P(Face Card)12/52
P(King | Face Card)(4/52) / (12/52) = 1/3

Independence

Two events are independent when knowledge of one provides no information about the other:

P(A|B) = P(A)

ScenarioIndependence
Consecutive coin flipsIndependent
Drawing cards without replacementNot independent

Bayes' Theorem

Bayes' theorem relates conditional probabilities:

P(A|B) = P(B|A) x P(A) / P(B)

ComponentNameDescription
P(A)PriorInitial belief before evidence
P(B|A)LikelihoodProbability of evidence given hypothesis
P(A|B)PosteriorUpdated belief after evidence

Medical Test Example

A disease affects 1% of the population. A test has 95% sensitivity (true positive rate) and 90% specificity (true negative rate).

Given: Positive test result. Calculate probability of having the disease.

ParameterValue
P(Disease)0.01
P(Positive | Disease)0.95
P(Positive | No Disease)0.10

Calculation:

First, calculate the probability of a positive test result: P(Positive) = (0.95 x 0.01) + (0.10 x 0.99) = 0.1085

Then, apply Bayes' theorem: P(Disease|Positive) = (0.95 x 0.01) / 0.1085 = approximately 0.088 (8.8%)

Result: Despite the positive test, the probability of having the disease is approximately 8.8%. The low base rate (1% prevalence) means most positive tests are false positives.

Probability Distributions

Discrete Distributions

DistributionUse CaseParametersMeanVariance
BernoulliSingle binary trialppp(1-p)
BinomialCount of successes in n trialsn, pnpnp(1-p)
PoissonCount of events in fixed intervalλλλ

Binomial example: Probability of exactly 7 heads in 10 coin flips.

Poisson example: Number of customer arrivals per hour when average rate is 5/hour.

Continuous Distributions

DistributionUse CaseKey Properties
Normal (Gaussian)Natural phenomena, Central Limit Theorem68-95-99.7 rule
ExponentialTime between eventsMemoryless property
UniformEqual probability over intervalConstant density

Normal distribution percentiles:

RangePercentage of Data
Within 1σ of mean68%
Within 2σ of mean95%
Within 3σ of mean99.7%

Expected Value

Expected value represents the long-run average of a random variable:

E[X] = Sum of (value x probability) for all outcomes

Example: Game where rolling a 6 wins $10, otherwise lose $2.

E[X] = (1/6 x $10) + (5/6 x -$2) = $1.67 - $1.67 = $0

Expected value: $0 (fair game).

Variance

Variance measures the spread of a distribution around its mean:

Var(X) = E[(X - mean)^2] = E[X^2] - (E[X])^2

Properties:

PropertyFormula
ScalingVar(aX) = a²Var(X)
Sum of independent variablesVar(X + Y) = Var(X) + Var(Y)

Central Limit Theorem

The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the underlying population distribution.

The sample mean follows approximately a normal distribution with mean equal to the population mean and variance equal to the population variance divided by sample size.

Sample SizeEffect on Distribution
Larger nNarrower distribution (smaller variance)
Larger nBetter approximation to normal

Common Problem Types

Complement Problems

For "at least one" problems, use the complement:

P(at least 1 head in 5 flips) = 1 - P(no heads) = 1 - (0.5)^5 = 0.97

Birthday Problem

Number of people required for 50% probability of a shared birthday: 23

Calculation uses complement: P(no match with n people) = (365/365) x (364/365) x (363/365) x ... continuing for n-1 terms

Monty Hall Problem

Setup: Three doors, one prize, two goats. After initial selection, host reveals a goat behind another door.

Optimal strategy: Switch doors.

StrategyProbability of Winning
Stay1/3
Switch2/3

Common Errors

ErrorDescription
Gambler's fallacyAssuming past independent events affect future outcomes
Base rate neglectIgnoring prior probabilities in conditional probability calculations
Confusing P(A|B) with P(B|A)These are distinct conditional probabilities

Problem-Solving Approaches

ApproachApplication
Draw probability treesMulti-stage probability problems
Use complement"At least one" problems
State assumptionsIndependence, distribution type
Verify boundsProbabilities must be between 0 and 1