## What Is Survey Sampling?

Surveys would be meaningless and incomplete without accounting for the respondents that they’re aimed at. The best survey design practices keep the target population at the core of their thought process.

‘All the residents of the Dharavi slums in Mumbai’, ‘every NGO in Calcutta’ and ‘all students below the age of 16 in Manipur’ are examples of a population; they are countable, finite and well-defined.

When the population is small enough, researchers have the resources to reach out to all of them. This would be the best case scenario, making sure that everybody who matters to the survey is represented accurately. A survey that covers the entire target population is called a census.

However, most surveys cannot survey the entire population. This is when sampling techniques become crucial to your survey.

## Why Is It Important?

### Resource Constraints

If the target population is not small enough, or if the resources at your disposal don’t give you the bandwidth to cover the entire population, it is important to identify a subset of the population to work with – a carefully identified group that is representative of the population. This process is called survey sampling, and it is one of the most important aspects of survey design.

Whatever the sample size, there are fixed costs associated with any survey. Once the survey has begun, the marginal costs associated with gathering more information, from more people, are proportional to the size of the sample.

### Drawing Inferences About the Population

Researchers are not interested in the sample itself, but in the understanding that they can potentially infer from the sample and then apply across the entire population.

A sample survey usually offers greater scope than a census. Working within a given resource constraint, sampling may make it possible to study the population of a larger geographical area or to find out more about the same population by examining an area in greater depth through a smaller sample.

Before we dive into the survey sampling methods at our disposal it is imperative that we develop a perspective on what an effective sample should look like.

## 3 Features to Keep in Mind While Constructing a Sample

### Consistency

It is important that researchers understand the population on a case-by-case basis and test the sample for consistency before going ahead with the survey. This is especially critical for surveys that track changes across time and space where we need to be confident that any change we see in our data reflects real change – across consistent and comparable samples.

### Diversity

Ensuring diversity of the sample is a tall order, as reaching some portions of the population and convincing them to participate in the survey could be difficult. But to be truly representative of the population, a sample must be as diverse as the population itself and sensitive to the local differences that are unavoidable as we move across the population.

### Transparency

There are several constraints that dictate the size and structure of the population. It is imperative that researchers discuss these limitations and maintain transparency about the procedures followed while selecting the sample so that the results of the survey are seen with the right perspective.

Now that we understand the necessity of choosing the right sample and have a vision of what an effective sample for your survey should be like, let’s explore the various methods of constructing a sample and understand the relative pros and cons of each of these approaches.

Sampling methods can broadly be classified as probability and non-probability.

## 3 Probability Sampling Techniques

When each entity of the population has a definite, non-zero probability of being incorporated into the sample, the sample is known as a probability sample.

Probability samples are selected in such a way as to be representative of the population. They provide the most valid or credible results because they reflect the characteristics of the population from which they are selected.

Probability sampling techniques include random sampling, systematic sampling, and stratified sampling.

### Random Sampling

**When:** There is a very large population and it is difficult to identify every member of the population.

**How: **The entire process of sampling is done in a single step with each subject selected independently of the other members of the population. The term random has a very precise meaning and you can’t just collect responses on the street and have a random sample.

**Pros: **In this technique, each member of the population has an equal chance of being selected as subject.

**Cons: **When there are very large populations, it is often difficult to identify every member of the population and the pool of subjects becomes biased. Dialing numbers from a phone book for instance, may not be entirely random as the numbers, though random, would correspond to a localized region. A sample created by doing so might leave out many sections of the population that are significant to the study.

**Use case: **Want to study and understand the rice consumption pattern across rural India? While it might not be possible to cover every household, you could draw meaningful insights by building your sample from different districts or villages (depending on the scope).

### Systematic Sampling

**When:** Your given population is logically homogenous.

**How:** In a systematic sample, after you decide the sample size, arrange the elements of the population in some order and select terms at regular intervals from the list.

**Pros:** The main advantage of using systematic sampling over simple random sampling is its simplicity. Another advantage of systematic random sampling over simple random sampling is the assurance that the population will be evenly sampled. There exists a chance in simple random sampling that allows a clustered selection of subjects. This can be avoided through systematic sampling.

**Cons:** The possible weakness of the method that may compromise the randomness of the sample is an inherent periodicity of the list. This can be avoided by randomizing the list of your population entities, as you would randomize a deck of cards for instance, before you proceed with systematic sampling.

**Use Case**: Suppose a supermarket wants to study buying habits of their customers. Using systematic sampling, they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.

### Stratified Sampling

**When:** You can divide your population into characteristics of importance for the research.

**How: **A stratified sample, in essence, tries to recreate the statistical features of the population on a smaller scale. Before sampling, the population is divided into characteristics of importance for the research — for example, by gender, social class, education level, religion, etc. Then the population is randomly sampled within each category or stratum. If 38% of the population is college-educated, then 38% of the sample is randomly selected from the college-educated subset of the population.

**Pros: **This method attempts to overcome the shortcomings of random sampling by splitting the population into various distinct segments and selecting entities from each of them. This ensures that every category of the population is represented in the sample. Stratified sampling is often used when one or more of the sections in the population have a low incidence relative to the other sections.

**Cons: **Stratified sampling is the most complex method of sampling. It lays down criteria that may be difficult to fulfill and place a heavy strain on your available resources.

**Use Case: **If 38% of the population is college-educated and 62% of the population have not been to college, then 38% of the sample is randomly selected from the college-educated subset of the population and 62% of the sample is randomly selected from the non-college-going population. Maintaining the ratios while selecting a randomized sample is key to stratified sampling.

## 3 Non-Probability Sampling Techniques

Non-probability sampling techniques include convenience sampling, snowball sampling and quota sampling.

In these techniques, the units that make up the sample are collected with no specific probability structure in mind. The selection is not completely randomized, and hence the resultant sample isn’t truly representative of the population.

### Convenience Sampling

**When:** During preliminary research efforts.

**How: **As the name suggests, the elements of such a sample are picked only on the basis of convenience in terms of availability, reach and accessibility.

**Pros:** The sample is created quickly without adding any additional burden on the available resources.

**Cons: **The likelihood of this approach leading to a sample that is truly representative of the population is very poor.

**Use Case:** This method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a random sample.

### Snowball Sampling

**When:** When you can rely on your initial respondents to refer you to the next respondents.

**How:** Just as the snowball rolls and gathers mass, the sample constructed in this way will grow in size as you move through the process of conducting a survey. In this technique, you rely on your initial respondents to refer you to the next respondents whom you may connect with for the purpose of your survey.

**Pros:** The costs associated with this method are significantly lower, and you will end up with a sample that is very relevant to your study.

**Cons:** The clear downside of this approach is that you may restrict yourself to only a small, largely homogenous section of the population.

**Use Case: **Snowball sampling can be useful when you need the sample to reflect certain features that are difficult to find. To conduct a survey of people who go jogging in a certain park every morning, for example, snowball sampling would be a quick, accurate way to create the sample.

### Quota Sampling

**When:** When you can characterize the population based on certain desired features.

**How: **Quota sampling is the non-probability equivalent of stratified sampling that we discussed earlier. It starts with characterizing the population based on certain desired features and assigns a quota to each subset of the population.

**Pros:** This process can be extended to cover several characteristics and varying degrees of complexity.

**Cons:** Though the method is superior to convenience and snowball sampling, it does not offer the statistical insights of any of the probability methods.

**Use Case: **If a survey requires a sample of fifty men and fifty women, a quota sample will survey respondents until the right number of each type has been surveyed. Unlike stratified sampling, the sample isn’t necessarily randomized.

Learn more about when to do quota sampling and how to do it correctly.

Probability sampling techniques are clearly superior, but the costs can be prohibitive. For the initial stages of a study, non-probability sampling techniques might be sufficient to give you a sense of what you’re dealing with. For detailed insights and results that you can bank upon, move on to the more sophisticated techniques as the study gathers pace and takes a more concrete structure.

Once you have created your sample, optimize your survey quality by choosing the right survey question types.

*This quick guide shares how all 6 tested sampling techniques can help you reduce sampling error. It also includes practical, simple changes that can help you calculate an accurate sample size with plenty of examples. Download it now!*

*Note: This article was originally published on 27 April 2015, then refreshed and updated on 25 July 2017. *

## 33 comments

Very great post. I just stumbled upon yur blog and wante to mention that I have really

love browsing your blog posts. In any case I will be subscribing for your rss feed and I hope you write again soon!

Hi Margarito! Thanks for the message, and I’m really glad to hear that you like our blogs. If you want weekly updates on new blog posts, you should also subscribe to our newsletters. http://soco.ps/1pIl93b

Great insight Christine, i am a master student have never understood well validity and reliability in research if we can have a blog on the same i will really appreciate.

Thanks Paul! I’ll put that topic on our list.

Thats so helpful. Written in a very clear style with good examples.

thanks Christine very informative and clear.

good

A great insight to us who do surveys now and then with a number of indicators which requires survey.

I had three villages with two enumerators in each of the village with which I wanted 20 respondents from each enumerator. Can that be Quota Sampling or Convenient sampling?

First question of the survey for one to be interviewed was, “Did you attend any of the trainings

provided by the project?”

Regards

Hi Daud, thanks for the question! You can use either. If you just want quick feedback on who showed up, convenience sampling is the quickest, easiest way to get your sample. However, quota sampling is more rigorous and will give you more insight into the demographics you’re reaching for just a little extra effort.

Informative and very nicely written! Thanks for taking the time and effort to put this piece together. I have circulated it within my network.

Hi. I am initiating a study on cognitive development for a certain country in children between 6 and 14 years old who are given a test set. In addition, it is desired to calculate the average score obtained by age. The country has 19 states. Each age has approximately 50,000 children giving a total of 450000. So what would be the best way to select the sample? I mean? how many states would be enough? What would be the right size per age? If I did it by age I would give 384 * 9 = 3456 which seems a lot in terms of cost. But if I considered the population as a whole I would give 384, which would give me about 40 per age group. This seems little to make population inferences. So what would be the right reasoning? Thank you very much for your help

Hi Marie,

Thanks for your interest in our input. It is hard to say how many states need to be included in your sample without knowing much about the population you are working with. A lot of it depends on how diverse the population is within your country of study. Make sure you try to represent the demographics of your target population as much as you can.

With that being said, you will likely get more accurate results if you sample based on age. 40 people is definitely not enough to generalize across a population of 50,000. Do you have access to local schools for finding children to survey? Using schools to perform clustered sampling would allow you to decrease your cost per survey. Hope this helps!

Very impressive programme. Want to be part.

Hi thankyou for your blog it was very helpful I want to know that to know the potential user of any product in any state.

Which sample design will be good to use with reason

Hi,

Very well explained. Good job.

I would like to ask if I want to sample NGOs in my country which is in Malaysia, should I use stratified random sampling (probability sampling) or purposive sampling (non-probability sampling). My target group is NGO who involves in the conservation of biodiversity only.

After I stratified NGOs to NGOs who involves in the conservation of biodiversity only, how should I randomly sample NGOs?

I would love if you can help me to clarify this. Many thanks.

Hi Su, thanks for the note. First, you can use all NGOs involved in the conservation of biodiversity as your population (rather than all NGOs). No stratification needed there! After that, you can choose from probability or non-probability sampling.

Why are you sampling these NGOs? If you’re just looking to get information easily and quickly (for example, if you want to learn about those NGOs’ data needs so you can build a trial product to help them), non-probability sampling will probably be easier. Usually, it will let you get lots of information far more quickly. But if you’re looking to do more rigorous analysis (for example, if you’re writing a national report on the financial status of these NGOs in 2018), then probability sampling is usually better.

Great Job. Simple and to the point

Can we do probability sampling in any modified way for a population where I do not have a sampling frame, for example walk in patients in an OPD of a hospital

What about systematic sampling? You could discreetly sample every

nth person who walks into the OPD. As long as there’s no inherent order that people walk in, this should lead to a representative sample.Hi I am Jerry from the Philippines. I am currently working on a module for introduction to research and scientific writing intended for high school students, more like research made super easy kind of thing (stress on the “super” . . . hehe). It’s a step by step guide in conducting research for beginners. I intend to use it in the classes I handle. I find your discussion very simple yet informative, the same goal I have for my module. I would like to ask permission to use your blog as one of my reference in discussing sampling techniques. Thanks in advance!

Hi Jerry, glad to hear this blog would be helpful for students. Feel free to use it!

P.S. We’ll be publishing an entire ebook on sampling in the next couple of months. Should we send that to you when it’s out?

Yey!!! That would be great. Looking forward to that 🙂

Hi there,

I’m planning to conduct a survey regarding citizens’ awareness on water supply issues, whether the community is aware or unaware with the amount of available water supply. The total population is 8.27 million. It’s a very huge number so how should I proceed with my sampling technique to make sure my survey is representative of the total population?

Thanks a lot!

Hey Richard, good question! The short answer is that 400-1,000 people would be a reasonable sample.

The long answer — take a look at the “Calculating Sample Size” section on page 26 of our data collection ebook (https://socialcops.com/ebooks/data-collection/). If you put your population into the sample size formula (with a 5% margin of error and 95% confidence level), you get a sample of 384 people. That’s the default sample size for big populations.

However, that sample size formula is best for homogenous (i.e. internally similar) populations. If you think that your population has diversity that’s relevant to your research, it’s often a good idea to increase your sample size. This gives you a bit more peace of mind and greater statistical confidence. Plus, if you’re using stratified sampling, a larger sample allows you to stratify your population as much as you need without making your sub-samples too small.

If you’re increasing your sample, you probably won’t need to go above 1,000 people. (That’s generally where private organizations stop. It gets expensive and time consuming to go above that, and that sample size gives you a margin of error of 3%.) Of course, you can choose to make your sample as big as you like. Try playing around with the sample size formulas to see how changing your sample affects your margin of error.

Hope that helps! Let me know if you have any other questions.

P.S. Probabilistic sampling is the best way to go for your project. It’ll give much greater confidence that your sample is truly representative of the 8 million people you’re trying to understand.

Hi Christine,

Thank you for writing this blog, it’s great! I have a complicated question: I’m working with a group that has conducted a country-wide survey at the district level using a probability proportional to size (PPS) sampling technique of villages constructed based on population estimates. They then conducted several surveys over time and chose 20 villages with a randomly selected starting place on lists that were organized by size of the village (so, intrinsic stratification based on size), and then once villages were chosen, enumerators randomly selected households within the village to conduct the surveys. So, if all was done correctly, one can calculate statistics and confidently assume they are representative at the district level.

However, now the investigators are asking for statistics to be generated at a level under the district (tehsil) and we are no longer confident that the sample will be representative of the population at this smaller geographic level. So, I’m wondering if there is a way to assess randomness of the sample at the tehsil level? Or, what would you do to find a representative sample at the tehsil level? Any way to weight the sample in some way or do something else? Or, are we stuck with essentially reporting from a “convenience sample” at the tehsil level and simply stating this as a caveat?

Also, we are considering using certain criteria like: time (at least X number of villages within a particular tehsil were included in X number of surveys over time), geography (villages in a particular tehsil are distributed across the tehsil), resampling of villages (villages were not resampled excessively and if they were we will choose data from only one survey instance), urban/rural (villages are adequately distributed across urban and rural areas). What do you think about this approach to choosing tehsils that would give us fairly representative data to analyze? Note that we do not have population size at the tehsil level either. Thank you!

Glad you enjoyed the blog! This is definitely an interesting question. Your list of possible criteria is certainly extensive, but I think it’s actually not needed. You should hold your sample to the same criteria, no matter what geographic level you’re working at. If your tehsil-level sample fulfills the same criteria as the district-level sample you created, then the tehsil-level sample is just as valid.

To clarify – your original sample used PPS sampling, meaning that you created a list of villages for each district, ranked each list by village size, and chose villages of each list at regular intervals. So your core criteria was that your chosen villages were evenly distributed across a list that was ranked by population.

Do your surveyed villages still fit this criteria at tehsil level? You can check this by creating a list of villages for each tehsil, ranked by population (just like before). Then highlight the villages on each tehsil list that you already surveyed. How are these villages distributed on each tehsil list?

Are they fairly evenly spread out? If so, you’re all set! Your sample will hold up well at the tehsil level, and you can calculate tehsil-level statistics. (Though, for full transparency, you should add a caveat that the sample was originally created at the district level).

Alternatively, are the villages for some tehsils clustered at the top or bottom of the list? If so, then your sample unfortunately isn’t rigorous at the tehsil level. I suppose you could try weighting the villages to fix this, but it’ll be quite tricky and prone to error. It would be much safer to report tehsil-level stats as a convenience sample.

Does this make sense? Let me know if you have any questions!

I am doing a comparative study on children behaviour of class 3, 4 thus I want to take 25 student from each class what kind of sampling is suitable pls give idea

Hey Arpita! First, probability sampling is the way to go. It’s a lot more reliable, and school is a nice controlled environment where probability sampling is definitely possible. You have a couple of options within probability sampling.

In a school, it’s easy to get a list of all students. So the easiest option is systematic sampling. First, get a list of all the kids’ names in each class. (Make sure the list of names has been randomized!) Then you’ll pick children’s names off the list at random intervals. For example, if there are 100 kids on each class list, and you need 25 kids per class, you’ll pick every 4th name. That’ll give you a random sample of 25 kids per class.

Systematic sampling assumes that you don’t care about any internal differences between kids in the same class. If you do care about these differences, then you want to use stratified sampling.

How would this work? Here’s an example. Say you think that kids’s grades are probably related to their behavior. First, split the kids in each class into the groups you care about — i.e. divide them into 3 groups based on their class ranking. One group would have the top third of the class, one group will have the middle third, and one group will have the bottom third of the class. Then you can choose one third of your sample from each group. (Like above, randomize each group’s list, then choose 12 kids off each list.)

What if the characteristic you care about isn’t evenly distributed? Then make sure your sample reflects the distribution of your group in each class. For example, say you think that boys and girls behave differently, and you know that 40% of each class is male and 60% is female. You would split each class’ student list into separate lists — one with male students, and one with female students. Then randomize each list, and choose 40% of your sample from the male list and 60% from the female list.

Hope that helps! Let me know if you have any other questions.

Thanks a lot. It really help me to select sample

Is this will be purposive or quota if you kindly give some idea

Great….