Probability for Data Science Interviews
Probability forms the mathematical foundation for reasoning about uncertainty in data science. This section covers the core concepts tested in interviews.
Fundamental Definitions
| Term | Definition | Example |
|---|---|---|
| Sample space | Set of all possible outcomes | Die roll: 6 |
| Event | Subset of outcomes | "Rolling even": 6 |
| Probability | Measure from 0 (impossible) to 1 (certain) | P(rolling even) = 3/6 = 0.5 |
Core Rules
Addition Rule
For calculating the probability of A or B occurring:
P(A or B) = P(A) + P(B) - P(A and B)
Multiplication Rule
For calculating the probability of A and B occurring (independent events):
P(A and B) = P(A) x P(B)
Conditional Probability
Conditional probability measures the probability of event A given that event B has occurred.
P(A|B) = P(A and B) / P(B)
Example: Drawing a card from a standard deck.
| Given Information | Calculation |
|---|---|
| P(King and Face Card) | 4/52 |
| P(Face Card) | 12/52 |
| P(King | Face Card) | (4/52) / (12/52) = 1/3 |
Independence
Two events are independent when knowledge of one provides no information about the other:
P(A|B) = P(A)
| Scenario | Independence |
|---|---|
| Consecutive coin flips | Independent |
| Drawing cards without replacement | Not independent |
Bayes' Theorem
Bayes' theorem relates conditional probabilities:
P(A|B) = P(B|A) x P(A) / P(B)
| Component | Name | Description |
|---|---|---|
| P(A) | Prior | Initial belief before evidence |
| P(B|A) | Likelihood | Probability of evidence given hypothesis |
| P(A|B) | Posterior | Updated belief after evidence |
Medical Test Example
A disease affects 1% of the population. A test has 95% sensitivity (true positive rate) and 90% specificity (true negative rate).
Given: Positive test result. Calculate probability of having the disease.
| Parameter | Value |
|---|---|
| P(Disease) | 0.01 |
| P(Positive | Disease) | 0.95 |
| P(Positive | No Disease) | 0.10 |
Calculation:
First, calculate the probability of a positive test result: P(Positive) = (0.95 x 0.01) + (0.10 x 0.99) = 0.1085
Then, apply Bayes' theorem: P(Disease|Positive) = (0.95 x 0.01) / 0.1085 = approximately 0.088 (8.8%)
Result: Despite the positive test, the probability of having the disease is approximately 8.8%. The low base rate (1% prevalence) means most positive tests are false positives.
Probability Distributions
Discrete Distributions
| Distribution | Use Case | Parameters | Mean | Variance |
|---|---|---|---|---|
| Bernoulli | Single binary trial | p | p | p(1-p) |
| Binomial | Count of successes in n trials | n, p | np | np(1-p) |
| Poisson | Count of events in fixed interval | λ | λ | λ |
Binomial example: Probability of exactly 7 heads in 10 coin flips.
Poisson example: Number of customer arrivals per hour when average rate is 5/hour.
Continuous Distributions
| Distribution | Use Case | Key Properties |
|---|---|---|
| Normal (Gaussian) | Natural phenomena, Central Limit Theorem | 68-95-99.7 rule |
| Exponential | Time between events | Memoryless property |
| Uniform | Equal probability over interval | Constant density |
Normal distribution percentiles:
| Range | Percentage of Data |
|---|---|
| Within 1σ of mean | 68% |
| Within 2σ of mean | 95% |
| Within 3σ of mean | 99.7% |
Expected Value
Expected value represents the long-run average of a random variable:
E[X] = Sum of (value x probability) for all outcomes
Example: Game where rolling a 6 wins $10, otherwise lose $2.
E[X] = (1/6 x $10) + (5/6 x -$2) = $1.67 - $1.67 = $0
Expected value: $0 (fair game).
Variance
Variance measures the spread of a distribution around its mean:
Var(X) = E[(X - mean)^2] = E[X^2] - (E[X])^2
Properties:
| Property | Formula |
|---|---|
| Scaling | Var(aX) = a²Var(X) |
| Sum of independent variables | Var(X + Y) = Var(X) + Var(Y) |
Central Limit Theorem
The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the underlying population distribution.
The sample mean follows approximately a normal distribution with mean equal to the population mean and variance equal to the population variance divided by sample size.
| Sample Size | Effect on Distribution |
|---|---|
| Larger n | Narrower distribution (smaller variance) |
| Larger n | Better approximation to normal |
Common Problem Types
Complement Problems
For "at least one" problems, use the complement:
P(at least 1 head in 5 flips) = 1 - P(no heads) = 1 - (0.5)^5 = 0.97
Birthday Problem
Number of people required for 50% probability of a shared birthday: 23
Calculation uses complement: P(no match with n people) = (365/365) x (364/365) x (363/365) x ... continuing for n-1 terms
Monty Hall Problem
Setup: Three doors, one prize, two goats. After initial selection, host reveals a goat behind another door.
Optimal strategy: Switch doors.
| Strategy | Probability of Winning |
|---|---|
| Stay | 1/3 |
| Switch | 2/3 |
Common Errors
| Error | Description |
|---|---|
| Gambler's fallacy | Assuming past independent events affect future outcomes |
| Base rate neglect | Ignoring prior probabilities in conditional probability calculations |
| Confusing P(A|B) with P(B|A) | These are distinct conditional probabilities |
Problem-Solving Approaches
| Approach | Application |
|---|---|
| Draw probability trees | Multi-stage probability problems |
| Use complement | "At least one" problems |
| State assumptions | Independence, distribution type |
| Verify bounds | Probabilities must be between 0 and 1 |