Hypothesis testing is one of the most important aspects of the Data Science field. Here we test different ideas or plans to check if the current idea is significantly better than the previous one. We perform different tests in Hypothesis Testing, such as Z-test, T-Test, Chi-square test, ANOVA, etc. In this article, we will learn about T-Test.
What is T-Test?
T-test is a statistical test used in Hypothesis Testing to compare the means of two samples or to compare a sample mean with a known population mean. The T-test is based on t-distribution. It is used when the population standard deviation is unknown.
Example:
Let's understand T-test with an example.
Consider an application that is available on both Android and iOS. Now, the company wants to find out whether the average time spent on the app is the same in both Android and iOS versions.
The company took 100 random users from both versions and measures the average time. The average time for the Android version came out to 15 minutes and The average time for the iOS version came out to 12 minutes.
Now can we say that users spent more time on the Android version of the application by simply looking at the average sample time? We have only looked at 100 random users out of the many people.
This is where we perform T-test. It helps us understand if the difference between two sample means is actually real or simply due to chance.
Types of T-Test
There are three types of T-Test:-
One Sample t-test
Independent two-sample t-test
Paired t-test
One Sample T-Test
The one-sample t-test is used to compare the mean of a single sample to a set value or population mean.
Assumptions for One-sample T-test:
The data should be normally distributed
The sample must be random
The observations in the sample must be independent which means that the value of one observation should not influence the value of another observation.
Population standard deviation is unknown.
Example:
A manufacturer claims that the average weight of their chips packet is 50gm. An individual doubts it and claims the average weight to be different. Let's say, he takes a sample of 30 packets and measures the weight. The sample mean comes out to be 49gm.
Now, here we can perform a one-sample t-test.
x-bar is the sample mean
µ is the theoretical value or population mean
S is the sample standard deviation
n is the sample size
After we calculate the t-statistic, we have to compare it with the t-critical value. For that, we have a t-table.
Independent Two-sample T-Test
The Independent two-sample T-test is used to compare the means of two independent samples to determine if there is a significant difference between them.
Assumptions for Independent two-sample T-test:
The data should be normally distributed
The sample must be random
The two samples must be independent
Variances among the groups should be equal
Example:
You can consider the first example regarding the average time spent on the application. Here, we assume that the two samples are independent which means that a particular Android user does not use the iOS version of the application.
Paired Sample T-test
Paired sample t-test is used to compare the means of two dependent groups. That's why it is also called a Dependent sample t-test.
Basically, here we measure one group at two different times.
Assumptions for Paired sample T-test:
The data should be normally distributed
The two groups must be related or paired
Example:
Suppose a college conducts a mentorship program to boost the learning of students. But, how will the college know if the learning of students has increased?
The college can conduct a test before and after the mentorship program. And here we are comparing the sample of students at two different times.