Everything about Sampling Techniques: A Data Scientist should know

Photo by Clay Banks on Unsplash

Everything about Sampling Techniques: A Data Scientist should know

In day-to-day life, a data scientist works with huge amounts of data. When they are going to analyze it, the system RAM simply can't handle it.

In this scenario, scientists and analysts select a subset of the entire dataset and perform multiple operations over it.

And this subset is known as Sample. There are different techniques or methods to draw a sample from the population. But before knowing those, let's see a little bit more about the population and sample.

Population vs Sample

A population is an individual or group that represents all the members of a group or category of interest. A Sample is a subset drawn from that larger population.

The values generated from a sample are called Statistics. And the value generated from a population is known as a parameter.

What is Sampling?

Sampling is a method of picking samples from a population.

Why do we need Sampling?

  • It is easy to work with a sample than with a larger population.

  • Cost-effective.

Different Types of Sampling Techniques

Broadly, we have two types of sampling techniques:

  1. Probability Sampling: In this technique, researchers choose samples from a population based on the probability method.

  2. Non-probability Sampling: In this process, individuals are selected based on non-random criteria.

In the case of non-probability sampling, there are high chances of sampling bias.

Types of Probability Sampling Techniques

There are four types of Probability Sampling Techniques:

  1. Simple Random Sampling: In this method, each member of a population has an equal chance of being selected. Random does not mean that we select individuals haphazardly.

  2. Systematic Sampling: In this technique, we choose individuals at regular intervals (every nth individual).

  3. Stratified Sampling: In this technique, we divide the entire population into various subgroups(strata) based on different categories (i.e. gender). After that, we select sample(s) from these subgroups.

  4. Cluster Sampling: This also involves dividing the population into subgroups. But, instead of sampling individuals, we randomly select an entire subgroup(cluster).

Types of Non-probability Sampling Techniques

  1. Convenience Sampling: Scientists choose those individuals who are the most accessible to the scientists. It is also based on an individual's willingness.

  2. Quota Sampling: In this technique, we choose individuals based on some traits. Suppose, in a population, 15% are retired persons, 50% are adult men and 35% are adult women. So, in our sample, the weightage has to be the same.

  3. Purposive/Judgement Sampling: In this technique, researchers use his/her own expertise and select individuals who are the most useful for the research/analysis.

  4. Snowball Sampling: Snowball sampling method is used to sample individuals via other participants.