Probability for Data Science Interviews

Probability forms the mathematical foundation for reasoning about uncertainty in data science. This section covers the core concepts tested in interviews.

Fundamental Definitions

Term	Definition	Example
Sample space	Set of all possible outcomes	Die roll: 6
Event	Subset of outcomes	"Rolling even": 6
Probability	Measure from 0 (impossible) to 1 (certain)	P(rolling even) = 3/6 = 0.5

Core Rules

Addition Rule

For calculating the probability of A or B occurring:

P(A or B) = P(A) + P(B) - P(A and B)

Multiplication Rule

For calculating the probability of A and B occurring (independent events):

P(A and B) = P(A) x P(B)

Conditional Probability

Conditional probability measures the probability of event A given that event B has occurred.

P(A|B) = P(A and B) / P(B)

Example: Drawing a card from a standard deck.

Given Information	Calculation
P(King and Face Card)	4/52
P(Face Card)	12/52
P(King \| Face Card)	(4/52) / (12/52) = 1/3

Independence

Two events are independent when knowledge of one provides no information about the other:

P(A|B) = P(A)

Scenario	Independence
Consecutive coin flips	Independent
Drawing cards without replacement	Not independent

Bayes' Theorem

Bayes' theorem relates conditional probabilities:

P(A|B) = P(B|A) x P(A) / P(B)

Component	Name	Description
P(A)	Prior	Initial belief before evidence
P(B\|A)	Likelihood	Probability of evidence given hypothesis
P(A\|B)	Posterior	Updated belief after evidence

Medical Test Example

A disease affects 1% of the population. A test has 95% sensitivity (true positive rate) and 90% specificity (true negative rate).

Given: Positive test result. Calculate probability of having the disease.

Parameter	Value
P(Disease)	0.01
P(Positive \| Disease)	0.95
P(Positive \| No Disease)	0.10

Calculation:

First, calculate the probability of a positive test result: P(Positive) = (0.95 x 0.01) + (0.10 x 0.99) = 0.1085

Then, apply Bayes' theorem: P(Disease|Positive) = (0.95 x 0.01) / 0.1085 = approximately 0.088 (8.8%)

Result: Despite the positive test, the probability of having the disease is approximately 8.8%. The low base rate (1% prevalence) means most positive tests are false positives.

Probability Distributions

Discrete Distributions

Distribution	Use Case	Parameters	Mean	Variance
Bernoulli	Single binary trial	p	p	p(1-p)
Binomial	Count of successes in n trials	n, p	np	np(1-p)
Poisson	Count of events in fixed interval	λ	λ	λ

Binomial example: Probability of exactly 7 heads in 10 coin flips.

Poisson example: Number of customer arrivals per hour when average rate is 5/hour.

Continuous Distributions

Distribution	Use Case	Key Properties
Normal (Gaussian)	Natural phenomena, Central Limit Theorem	68-95-99.7 rule
Exponential	Time between events	Memoryless property
Uniform	Equal probability over interval	Constant density

Normal distribution percentiles:

Range	Percentage of Data
Within 1σ of mean	68%
Within 2σ of mean	95%
Within 3σ of mean	99.7%

Expected Value

Expected value represents the long-run average of a random variable:

E[X] = Sum of (value x probability) for all outcomes

Example: Game where rolling a 6 wins $10, otherwise lose $2.

E[X] = (1/6 x $10) + (5/6 x -$2) = $1.67 - $1.67 = $0

Expected value: $0 (fair game).

Variance

Variance measures the spread of a distribution around its mean:

Var(X) = E[(X - mean)^2] = E[X^2] - (E[X])^2

Properties:

Property	Formula
Scaling	Var(aX) = a²Var(X)
Sum of independent variables	Var(X + Y) = Var(X) + Var(Y)

Central Limit Theorem

The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the underlying population distribution.

The sample mean follows approximately a normal distribution with mean equal to the population mean and variance equal to the population variance divided by sample size.

Sample Size	Effect on Distribution
Larger n	Narrower distribution (smaller variance)
Larger n	Better approximation to normal

Common Problem Types

Complement Problems

For "at least one" problems, use the complement:

P(at least 1 head in 5 flips) = 1 - P(no heads) = 1 - (0.5)^5 = 0.97

Birthday Problem

Number of people required for 50% probability of a shared birthday: 23

Calculation uses complement: P(no match with n people) = (365/365) x (364/365) x (363/365) x ... continuing for n-1 terms

Monty Hall Problem

Setup: Three doors, one prize, two goats. After initial selection, host reveals a goat behind another door.

Optimal strategy: Switch doors.

Strategy	Probability of Winning
Stay	1/3
Switch	2/3

Common Errors

Error	Description
Gambler's fallacy	Assuming past independent events affect future outcomes
Base rate neglect	Ignoring prior probabilities in conditional probability calculations
Confusing P(A\|B) with P(B\|A)	These are distinct conditional probabilities

Problem-Solving Approaches

Approach	Application
Draw probability trees	Multi-stage probability problems
Use complement	"At least one" problems
State assumptions	Independence, distribution type
Verify bounds	Probabilities must be between 0 and 1

Fundamental Definitions​

Core Rules​

Addition Rule​

Multiplication Rule​

Conditional Probability​

Independence​

Bayes' Theorem​

Medical Test Example​

Probability Distributions​

Discrete Distributions​

Continuous Distributions​

Expected Value​

Variance​

Central Limit Theorem​

Common Problem Types​

Complement Problems​

Birthday Problem​

Monty Hall Problem​

Common Errors​

Problem-Solving Approaches​

Table of Contents