p-Values: What They Actually Mean (And What They Don't)
The most misunderstood number in statistics — a clear, honest explanation of what a p-value tells you, and the four things it absolutely does not.
The p-value is the most cited, most misunderstood, and most abused number in all of applied statistics. It appears in academic papers, A/B test reports, and clinical trial results. It is used to justify decisions worth millions of dollars. And the majority of people who use it cannot give you an accurate definition.
The Actual Definition
A p-value is the probability of observing a result at least as extreme as the one you got, assuming the null hypothesis is true. That is it. It is a conditional probability about your data, given a world where the null hypothesis holds. It is not a probability about your hypothesis.
"The p-value answers a question you probably were not asking: 'How surprising is this data if there were no real effect?'"
Four Things a p-Value Does Not Tell You
- —It does not tell you the probability that your hypothesis is true. That would require a Bayesian framework and a prior probability.
- —It does not measure the size of the effect. A tiny, practically meaningless effect can produce a very small p-value with a large enough sample.
- —It does not tell you whether your result will replicate. The replication crisis in social science is partly a story about misplaced faith in p-values.
- —It does not validate your experimental design. A p-value of 0.001 from a badly designed study is still worthless.
What to Use Instead (Or Alongside)
The American Statistical Association has explicitly cautioned against using p < 0.05 as a binary threshold for decision-making. Better practice includes reporting effect sizes (Cohen's d, odds ratios), confidence intervals, and — where appropriate — Bayesian credible intervals. For business decisions, the practical significance of an effect is almost always more important than its statistical significance.
| Metric | What It Measures | When to Use It |
|---|---|---|
| p-value | Probability of data given null hypothesis | As one signal among many, never in isolation |
| Effect size (Cohen's d) | Magnitude of the difference | Always — alongside any significance test |
| Confidence interval | Range of plausible true values | When you need to communicate uncertainty |
| Bayesian credible interval | Probability distribution over parameter values | When you have meaningful prior information |
Discussion
No comments yet. Be the first to start the discussion.

