## What is the p-value when Hypothesis Testing?

The p-value helps us determine the significance of our results. The hypothesis test is used to test the validity of a claim being made about the overall population. The claim that is being tested we determine to be the null hypothesis. If the null hypothesis was concluded to be false this is referred to as the alternative hypothesis.

Hypothesis tests use the p-value to weight the strength of the evidence (what the data tells us about the overall population).

* A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis – and accept the alternative hypothesis – this is a statistically significant outcome.

* A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis – accept the null hypothesis, and reject the alternative hypothesis – no statistical significance

* p-values very close to the cutoff (0.05) are considered to be marginal (could go either way) – further sampling should be performed where possible.

If possible another 19 samples can be tested and hope that in the other 19 cases the p-value becomes significantly smaller than 0.05, or significantly bigger – at least then we can say in 19/20 (95% certainty) we can reject or fail to reject the null hypothesis.

Always report the p-value so your audience can draw their own conclusions.

In the case you are insisting on an alpha different than 0.05 – say 0.01 – 99% certainty then you are enforcing a different cut-off value and the rules above for 0.05 become 0.01.

So you are making it harder to find a statistical significant result – in other words you are saying that you need further proof before you will accept an alternative hypothesis, and before you will fail to reject the null hypothesis.

Alternatively if you set the alpha to be 0.1 you make it easier to find a statistically significant result and may over-optimistically reject the null hypothesis because the p-value might be 0.08.

Changing the alpha up or down like this may make it harder or easier to make Type 1 (rejecting the null hypothesis when you should have failed to reject (accept) it – a false positive) and Type 2 errors (failing to reject (accepting) the null hypothesis when you should have rejected it – a false negative)

A simple example springing to mind of why even at a 100% certainty we still might fail to reject the null hypothesis would be a guy called Thomas when he hears that Jesus has rose from the dead and appeared to the other 10 apostles. Out of a population of all possible apostles who weren’t named Thomas (Judas was dead by this stage) who could have seen a person rise from the dead – Thomas still doubted – why because his null hypothesis was people simply did not resurrect themselves from the dead (it had never happened before – and I don’t think it has happened since either) – and unless he saw it with his own eyes he would never reject his null hypothesis and no amount of talk from the other 10 would make it statistically significant. Once Jesus appeared to him – then he was able to reject his null hypothesis and accept the alternative hypothesis that this was a statistically significant event and that Jesus had in fact arisen from the dead.

Or another way to look at this was that if 10/11 apostles witnessed, giving 91% apostles who saw and 9% apostles (Thomas) who didn’t – p-value of 0.09 and that at an alpha of .05 meant all 11 would have to see for Thomas to believe – therefore 11/11.

A less tongue an cheek example from the web might look like – Apache Pizza pizza place claims their delivery times are 30 minutes or less on average but you think it’s more than that. You conduct a hypothesis test because you believe the null hypothesis, Ho, that the mean delivery time is 30 minutes max, is incorrect. Your alternative hypothesis (Ha) is that the mean time is greater than 30 minutes. You randomly sample some delivery times and run the data through the hypothesis test, and your p-value turns out to be 0.001, which is much less than 0.05. In real terms, there is a probability of 0.001 that you will mistakenly reject the pizza place’s claim that their delivery time is less than or equal to 30 minutes. Since typically we are willing to reject the null hypothesis when this probability is less than 0.05, you conclude that the pizza place is wrong; their delivery times are in fact more than 30 minutes on average, and you want to know what they’re gonna do about it! (Of course, you could be wrong by having sampled an unusually high number of late pizza deliveries just by chance.)