Testing

My professor was teaching about hypothesis testing in class today.

It reminded me of some blogs by Allen Downey that I bookmarked ages ago.

I read through them in class and this is the framework to takeaway about hypothesis tests.

Compute test statistic that measures size of apparent effect. It could be a difference between two groups, absolute difference in means, see more examples here. We call this test statistic 𝛿
Define a null hypothesis, which is a model of the world under which the assumption that effect is not real, ex: if you think there is a difference between group A and B, H0 = there is no difference between A and B.
Model of null hypothesis should be stochastic, that is, capable of simulating data similar to original data.
Goal: compute p-value (probability of seeing an effect as big as 𝛿 under null hypothesis). You can estimate p-value using simulation: calculate the same test statistic you used on the actual data for each simulation.
Count the fraction of times the test statistic exceeds 𝛿. This fraction approximates p-value. If it's sufficiently small, you can conclude that the apparent effect is unlikely due to chance.

Why simulation?

analytical methods are slow and expensive, but even as computation gets faster, they are appealing because they are
- inflexible: using a standard test -> particular test statistic and model, might not be appropriate for problem domain.
- opaque: real-world scenario has many possible models, based on different assumptions. In standard tests, assumptions are implicit, not easy to know whether model is appropriate.
simulation on the other hand, are
- explicit: creating a simulation forces you to think about your modeling decisions, the simulations themselves document those decisions.
- arbitrarily flexible: can try out several test statistics and models, can choose most appropriate one for the scenario.