My professor was teaching about hypothesis testing in class today.
It reminded me of some blogs by Allen Downey that I bookmarked ages ago.
I read through them in class and this is the framework to takeaway about hypothesis tests.
- Compute test statistic that measures size of apparent effect. It could be a difference between two groups, absolute difference in means, see more examples here. We call this test statistic 𝛿
- Define a null hypothesis, which is a model of the world under which the assumption that effect is not real, ex: if you think there is a difference between group A and B, H0 = there is no difference between A and B.
- Model of null hypothesis should be stochastic, that is, capable of simulating data similar to original data.
- Goal: compute p-value (probability of seeing an effect as big as 𝛿 under null hypothesis). You can estimate p-value using simulation: calculate the same test statistic you used on the actual data for each simulation.
- Count the fraction of times the test statistic exceeds 𝛿. This fraction approximates p-value. If it's sufficiently small, you can conclude that the apparent effect is unlikely due to chance.
Why simulation?
- analytical methods are slow and expensive, but even as computation gets faster, they are appealing because they are
- inflexible: using a standard test -> particular test statistic and model, might not be appropriate for problem domain.
- opaque: real-world scenario has many possible models, based on different assumptions. In standard tests, assumptions are implicit, not easy to know whether model is appropriate.
- simulation on the other hand, are
- explicit: creating a simulation forces you to think about your modeling decisions, the simulations themselves document those decisions.
- arbitrarily flexible: can try out several test statistics and models, can choose most appropriate one for the scenario.