Why It’s Important to Standardize Your Data

Data is most valuable when you have something to compare it to, but these comparisons aren’t helpful if the data is bad or irrelevant.

Data is most valuable when you have something to compare it to. For example, it’s nice to know that your program helped 150 people this year, but that doesn’t tell you what you should do next year. If it’s a 50% decrease from last year (oops…), clearly you have some changes to make. If this is a 50% increase, though, you know that you’re on the right track!

Let’s take this one step further — data comparisons aren’t helpful if data is bad or irrelevant. For example, it may be helpful to compare children’s literacy rates across rural India. Areas with high literacy rates could have something to teach the rest of the country. But it wouldn’t be helpful to include areas that measure literacy rates differently or don’t report their literacy rates accurately.

Data standardization is this process of making sure that your data set can be compared to other data sets. It’s a key part of research, and it’s something that everyone who uses data should consider before they even collect, clean, or analyze their first data point.

What is data standardization exactly?

Data standardization is about making sure that data is internally consistent; that is, each data type has the same content and format.

Standardized values are useful for tracking data that isn’t easy to compare otherwise.

For example, suppose you and your friend went to different universities. One day, you both got your midterm grades for your Physics 101 classes. Your professor sticks to the normal grading scale (out of 100), so you got a grade of 84; the test has a mean of 77 and a standard deviation of 6. Your friend’s professor, though, uses his own grading scale, so she got a grade of 452. Her test has a scale of 750, mean of 400, and standard deviation of 100.

Both of you scored above average, but who did better? While the main data points might not be immediately comparable, there is a way to standardize and compare the data points. Converting them to percentages shows that you came out ahead, with an 84% compared to your friend’s 60%.

Similarly, you can convert the scores to z-scores, which show how you each performed in comparison to your classmates. This shows that you got a 1.16, meaning that you scored one standard deviation above average; while your friend got 0.52, meaning that she only scored half of a standard deviation above average.

No matter how you standardize your data, standardization gives both a data point and full data set greater meaning.

Check out our complete guide to quantitative research to learn how to plan and carry out great quantitative research.

Why standardized data is so important

The importance of having standardized data for comparison can be seen across the globe. In 2015, the United Nations outlined their “2030 Agenda for Sustainable Development”. In doing so, they outlined key indicators/goals to aid in ending poverty, protecting the planet, and ensuring prosperity for all.

In their General Assembly resolution, the UN stated that:

Quality, accessible, timely and reliable disaggregated data will be needed to help with the measurement of progress and to ensure that no one is left behind. Such data is key to decision making. Data and information from existing reporting mechanisms should be used where possible. We agree to intensify our efforts to strengthen statistical capacities in developing countries, particularly African countries, least developed countries, landlocked developing countries, small island developing States and middle-income countries.
United Nations, “Transforming our world: the 2030 Agenda for Sustainable Development”, 2015

The release also recognized that “baseline data for several of the targets remains unavailable,” and included a “call for increased support for strengthening data collection and capacity building in Member States, to develop national and global baselines where they do not yet exist.”

This UN Agenda for Sustainable Development, and their focus on data to help reach their 2030 goals, underscores the importance of data standardization — the success of the UN initiative inherently requires a standardized data set by which to measure progress.

How to standardize your data

So, how can you standardize your data?

First, let’s start with several ways that data can be standardized:

Collect data in common formats: This is when you make sure that your survey is set up to record the same data point in the same format every time. For example, people’s dates of birth shouldn’t be collected as “June 1986”, “21 Jan 1974” and “1956” in the same survey.
Collect data based on pre-set standards: If there are pre-existing international or local standards for how to measure and count a particular data point, stick to them. For example, the SDG indicators are a great international standard that more organizations are adopting today.
Transform data to a common format: During data cleaning, data standardization involves changing different data formats to just one format. (For example, DD/MM/YYY for dates or “per 100,000” people for any ratios like sex ratio.)
Convert data to z-scores: Rather than showing a data point on its own scale, z-scores show how many standard deviations a data point is from the mean (average). This conversion happens during data cleaning or analysis.

Data validations are a simple way to ensure that data can only be collected in standardized formats. Check out our blog on how to use 18 different data validations.

So, what steps should be taken to standardize data? Here is a beginner list:

Decide on data standards: How should data points be standardized? Which common standards or formats should the data follow? Are there any baseline measures by which all data points will be analyzed? A variety of elements, including the breadth and intended use of the data being collected, need to be considered.
Figure out where your data is coming from: How often and from where is new data arriving? Knowing data entry points helps you assess how reliable your data is, what data standardization problems you may face, and how you will be able to deal with them during data processing or cleaning.
Set up a great survey: If you’re collecting your own data (rather than using an existing data set), it’s important to think about data standardization while you write your survey. Collect data in a common format to begin with. (It’s easier to start with good data, rather than trying to standardize bad data later.) Add data validations to make sure that data is always reported in the same format, and make sure you collect data in accordance with existing standards.
Understand and clean your data: Once you’ve acquired your data, dig into its structure and values. Make sure that it is correct, clean, complete, formatted and verified before it gets committed into your data collection system and before you take action on that data.
Use existing questions and measures: Whenever it’s possible and reasonable, use pre-existing questions, methodologies, standards and data formats. This will allow you to compare new data against old data.

Standardized data is essential for accurate data analysis; it’s easier to draw clear conclusions about your current data when you have other data to measure it against. While standardization techniques might vary depending on the individual initiatives, without them, data loses much of the meaning it was intended to have in the first place.

Image credit: Martin Adams on Unsplash