Bayesian, Stats, and More Stats (Re-visited)
Date:
Mathematical Statistics (430)
Table of Contents
Introduction
Statistics has never been the course I’d call fun, but it’s undeniably important. Not just for its own sake, but because it underpins so much: machine learning, scientific research, even the way we argue about data in the real world. If you’re not thinking statistically, you’re probably doing something wrong.
Bayes, Bayes, Bayes
Bayesian thinking was a highlight. The basic update rule: \(P(H \mid D) = \frac{P(D \mid H) \, P(H)}{P(D)}\) Where:
- H is your hypothesis,
- D is the data you observed.
It’s such a simple formula, but it flips your mindset: knowledge is updated as new evidence arrives.
And then the definitions piled up: consistency, unbiasedness, efficiency. Each one felt like a new box to sort estimators into.
- Unbiasedness: An estimator $\hat{\theta}$ is unbiased if \(E[\hat{\theta}] = \theta\)
- Consistency: As sample size grows, \(\hat{\theta}_n \to \theta\)
- Efficiency: Among unbiased estimators, the one with the smallest variance wins.
It felt nitpicky at the time, but those properties became the mental shorthand for whether I could trust what my math was spitting out.
Variances Galore
One of my favorite confusions: the different “variances” that sneak around.
- Population variance: $\sigma^2 = E[(X - \mu)^2] $
- Sample variance (unbiased): $s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 $
- Variance of the sample mean: $\text{Var}(\bar{X}) = \frac{\sigma^2}{n} $ That last one is the law of large numbers in equation form: the more data you collect, the more stable your average becomes.
Revisited in SLE
Later, when I took Statistical Learning (SLE), all these concepts came back—not as abstract definitions, but as practical tools.
- Linear regression brought the bias-variance tradeoff front and center. Regularization (ridge, lasso) made me realize: bias isn’t always bad; sometimes you add it deliberately to shrink variance and improve prediction.
- Nearest-neighbor classification re-lit the consistency discussion. The kk-NN classifier is actually consistent as $k \to \infty$ and $k/n \to 0$. That’s the exact same “eventually it converges” idea I’d learned in 430, now dressed up in machine learning clothes.
- Even the idea of unbiasedness vs. efficiency cropped up again: should we prioritize models that are “correct on average,” or ones that have low variance in practice? SLE made it clear that, in real data problems, we often settle for biased but stable solutions.
So the “dry bread” of 430 became the yeast for SLE—suddenly rising into something useful.
Beyond the Class
Mathematical Statistics wasn’t the course that made me fall in love with math—but it was the course that made me respect it. It’s the grammar of data. And even when it feels like chewing sandpaper at times, I know I’ll keep needing it—Bayes, variances, and all.
