In day-to-day life, a data scientist works with huge amounts of data. When they are going to analyze it, the system RAM simply can't handle it.
In this scenario, scientists and analysts select a subset of the entire dataset and perform multiple operations over it.
And this subset is known as Sample. There are different techniques or methods to draw a sample from the population. But before knowing those, let's see a little bit more about the population and sample.
Population vs Sample
A population is an individual or group that represents all the members of a group or category of interest. A Sample is a subset drawn from that larger population.
The values generated from a sample are called Statistics. And the value generated from a population is known as a parameter.
What is Sampling?
Sampling is a method of picking samples from a population.
Why do we need Sampling?
It is easy to work with a sample than with a larger population.
Cost-effective.
Different Types of Sampling Techniques
Broadly, we have two types of sampling techniques:
Probability Sampling: In this technique, researchers choose samples from a population based on the probability method.
Non-probability Sampling: In this process, individuals are selected based on non-random criteria.
In the case of non-probability sampling, there are high chances of sampling bias.
Types of Probability Sampling Techniques
There are four types of Probability Sampling Techniques:
Simple Random Sampling: In this method, each member of a population has an equal chance of being selected. Random does not mean that we select individuals haphazardly.
Systematic Sampling: In this technique, we choose individuals at regular intervals (every nth individual).
Stratified Sampling: In this technique, we divide the entire population into various subgroups(strata) based on different categories (i.e. gender). After that, we select sample(s) from these subgroups.
Cluster Sampling: This also involves dividing the population into subgroups. But, instead of sampling individuals, we randomly select an entire subgroup(cluster).
Types of Non-probability Sampling Techniques
Convenience Sampling: Scientists choose those individuals who are the most accessible to the scientists. It is also based on an individual's willingness.
Quota Sampling: In this technique, we choose individuals based on some traits. Suppose, in a population, 15% are retired persons, 50% are adult men and 35% are adult women. So, in our sample, the weightage has to be the same.
Purposive/Judgement Sampling: In this technique, researchers use his/her own expertise and select individuals who are the most useful for the research/analysis.
Snowball Sampling: Snowball sampling method is used to sample individuals via other participants.