Sometimes basic concepts get too complicated if we learn them thinking they are complicated, this is what often happens in the case of hypothesis testing. When we plan to deep dive into the world of ML, we need to be familiar with the statistical concepts and one such concept is Hypothesis Testing.
Let’s consider an example of a legal case, and will try to explain the concept through this example.
What is Hypothesis?
The hypothesis is a claim, that needed to be tested.
Consider the example of legal case, if there is an acquisition made on a person ‘A’ by a person ‘B’ stating that he had ‘A’ had looted my money and not only mine but he is continuing this with other people as well, so he needed to be arrested. This is the hypothesis that ‘B’ has made on ‘A’, so now there should be proof with ‘B’ saying that ‘A’ had committed this Crime and ‘A’ tries to prove that this is a false acquisition. Now let me introduce two more terms here to move this case further.
- Null Hypothesis (Ho)
- Alternative Hypothesis (Ha)
Null Hypothesis:
These are currently accepted values from the experiment that are done in the past(scientifically) or currently accepted values on a parameter(Statistically).
Bringing back the legal case, ‘B’ had made an acquisition on ‘A’, which means ‘B’ and the defense lawyer accepts this on the basis of the proofs that he has produced. So, ‘B’ has a null hypothesis, now ‘A’ needs to find an Alternative hypothesis, an alternative theory with proof that proves that ‘B’ was wrong and he is falsely making some claims on me and wasting the time of the court.
Alternative Hypothesis:
It is also called Research Hypothesis, which involves the claim to be tested.
Going back to the example, ‘A’ now needs to prove to the court that ‘B’ was wrong, so need to prepare or find some alternative evidence that proves ‘B’s’ claim wrong.
Now for a second will come out from the courtroom and take a real statistical example.
Consider, there was a disease that spread in a country, the doctors and researchers had found a cure for that disease and now people who were affected by that disease are recovering well. — -1
After some years, researchers claims that the medicine that was discovered for this particular disease was only a temporary fix and that has many side effects in long term, so I need to alter that chemical composition and bring out a new one that can produce a permanent fix for the disease and also be safe to use. — — 2
Statement 1 is the Null hypothesis, is which already exists and is accepted.
Statement 2 is the Alternative hypothesis that needed to be proven.
In order to prove the Alternative analysis, the researchers needs to conduct some tests, so for that, he considers a sample.
(A sample is a particular set of data (that can be people or animals or any other object) from a particular population).
Once he starts testing this on the sample, there are two outcomes…
- He Succeeds
- He fails.
If he succeeds in proving his medicine was a better cure than the previous one then we say “Reject the Null Hypothesis”.
If he couldn’t prove that his medicine was not as good as the previous one that we say “Fail to reject the Null hypothesis”.
Concluding the court trial, if ‘A’ fails to prove that the claims that were made on him were wrong then he has to Fail to reject the null hypothesis.
If he succeeded then we can say Rejected the null hypothesis, and ‘A’ is not guilty.
So, If we organize it as an algorithm:
- We assume a Null hypothesis
- We propose an alternative hypothesis
- We test the alternative hypothesis
- If we are successful we Reject the Null Hypothesis
- If we fail to prove our alternative hypothesis then we Fail to reject the Null hypothesis.
Conclusion:
Hypothesis testing is a very basic step we need to do while building a model (In a machine learning context), without having a theory we cannot prove that our model is wrong. To break the rules we need to first know the rules.