You may be familiar with principles for good data visualization when it comes to ordinary bar plots, scatterplots, and line plots. However, geospatial data visualization has its own set of principles for effective and honest communication.
The answer to what is the “best” style of geospatial data visualization often depends on the type of data at hand and what you hope to communicate as a storyteller. Therefore, it is important to understand the fundamental differences among your thematic mapping options.
Accordingly, this blog compares and contrasts four of the most common types of thematic maps: choropleth, dot density, proportional symbols, and 3D choropleth. To emphasize the strengths of each option, I have visualized the same data set through each thematic mapping style, noting what kinds of stories each has to tell.
Note that I use the term “thematic map” only to distinguish it from a general reference map. Rather than navigation, the purpose of a thematic map is to visualize some kind of data in geographic space. In this context, I am discussing thematic maps that place colors, circles, dots, or polygons on top of a reference map.
Read More: Recently, SocialCops has published several resources on geospatial data visualization.
- Vasavi Ayalasomayajula highlighted seven techniques to visualize geospatial data.
- I looked at how the R packages
leafletcould be used to create an interactive choropleth.
- SocialCops also published a free Introduction to GIS in R course, which included lengthy lessons on creating static, animated and interactive maps.
About the Data Set
The data set used in this blog covers the three latest decades of Indian census figures on district-level household access to electricity and latrines.
In addition to being easy to understand, I find this data set instructive for three reasons. First, the population density is highly uneven, which is common in geospatial data of human settlements. Second, it has information that we want to communicate in a raw count in some cases, but also as a standardized rate in others. The tension between visualizing a raw count versus a standardized rate highlights the fundamental differences among these thematic mapping options.
Lastly, on top of teaching about data visualization principles, this data set holds stories that are valuable in their own right. At a fine-grain level, and cross-tabulated by key social identities, this data set tracks thirty years of progress (and lack thereof, in some cases) for hundreds of millions of people towards access to two fundamental amenities needed for healthy, productive lives. The data gives rich context for stories of public health, economic growth, competitive federalism, and societal inequality, to name a few.
Read More: For more information on the the data set, please see my earlier SocialCops blog.
About the Software
Once again, I’ll use Shiny to quickly compare each type of thematic map using the same data set. Instead of
leaflet however, I’ll use the
mapdeck library from David Cooley. Mapdeck is a relatively new R library under active development, which makes it easy to plot interactive maps using Mapbox GL and Deck.gl.
Note: Shiny is an R package for interactive web apps. See RStudio’s documentation to get started.
The Shiny app in these examples can be found here: https://shiny.socialcops.com/thematic_mapping/
Note: The code for wrangling the data and creating the visualization can be found in this blog’s GitHub repository. See the folder “shiny_thematic_mapping” for the code creating the visualization. See the
dots.R script for how I created the data behind the dot density map. See the
3d_choropleth.R script for how I created the data behind the other maps.
The choropleth is likely the most common thematic map. This style colors enumeration units (such as districts in India, in this case) according to the value of some quantity.
In the example below, we can see how the percentage of households with access to electricity sharply varies by district with respect to time, societal cross-section, or demographic cross-section.
Many parts of India, particularly South India and Gujarat, become more yellow as more households have gained access to electricity over time. At the same time, other regions, such as parts of Uttar Pradesh and Bihar, have remained stubbornly blue and purple, representing very low levels of electricity access.
It is important to note that here I am using a choropleth to map a rate or a ratio as opposed to a raw count. A percentage is one example of a standardized rate. Our data must fall in the range of 0 to 100.
Standardized data is very different from raw count data. The number of households with electricity in a district is an example of a raw count. It has not been manipulated or transformed in any way. Converting this raw count to a percentage standardizes it.
R packages (such as
mapdeck) will let you create a choropleth with raw count data, but it is usually a bad idea. Depending on the range of your data, color is often a bad way to communicate the magnitude of differences amongst enumeration units. Percentages are a better way to show these differences and quickly give insight into a number of important questions. (My previous blog post explored some of these questions in greater detail.)
However, a choropleth with percentage data conceals the vastly different populations within individual enumeration units. Geographically large enumeration units draw more attention than smaller units that may in fact hold more people. This is a big problem for Indian districts, where geographically-small metropolitan districts (such as Bangalore) have very large populations compared to larger but more sparsely-populated, primarily rural districts (such as Leh).
Read More: This is the same dilemma encountered in the common red-and-blue maps of U.S. election results. This article from Issie Lapowsky in Wired dives into a wide number of mapping techniques such as dasymetric dot density maps, value-by-alpha maps, and 3D prisms for interpreting the 2016 US presidential election.
When we look at the sea of colors in the choropleth above, we have to remember that India’s population density, like elsewhere in the world, is hardly uniform. Although the choropleth effectively shows relative differences in a standardized rate, it can’t clearly represent the magnitude of differences in raw counts. How might we address this challenge?
Dot Density Maps
A dot density map is one option that can address this problem. Unlike a choropleth, a dot density map is often an excellent choice to spatially visualize raw count data, because it randomly assigns a certain number of dots within each enumeration unit according to a value. This makes it very easy to ascertain where values cluster geographically.
Typically, dot density maps are more complicated to create in R than choropleths. The
dots.R script I wrote draws heavily on functions from a blog by Andreas Beger. His functions are modifications of the
sf::st_sample() function. However, the recent addition of an “exact” argument to the
sf::st_sample() function should make this process much simpler in the future.
Read More: In addition to Beger’s blog cited above, Paul Campbell has an excellent post on creating dot density maps in R, as does Tarak. However, the newly-added “exact” argument to
sf::st_sample() should simplify the code in these examples.
Our Shiny app includes an example of a “one-to-many” dot density map, where each dot represents 25,000 households. (Note that thus far, we always manipulated the data in some way. In the choropleth, I converted raw counts to a percentage. In the dot density map, I divided raw counts by a chosen dot value, in this case 25,000.)
Unlike a choropleth, a dot density map let me depict raw population growth and where it clusters over time. Compared to the choropleth, the dot density map more accurately represents populations in dense, urban centers compared to more sparsely-populated rural areas.
The map below compares households in India with access to both electricity and latrines versus those with neither of the two amenities. The first case shows a large concentration of dots in India’s largest urban centers – places like Delhi, Bangalore, Mumbai, Kolkata — and the state of Kerala. Toggling to depict households with access to neither electricity nor a latrine, however, shows a major shift to Uttar Pradesh and Bihar.
Examining the same data through the choropleth highlights the stark changes in colors, but the dot density map can better represent the raw differences in these populations.
Dot density maps also have the unique ability to map multivariate data. For example, we can plot both urban and rural populations at the same time using different colored dots. This may be their most valuable advantage compared to other geospatial visualizations.
With each dot representing 25,000 households, the map below shows that the overwhelming majority of 2011 households in India without access to electricity or a latrine were rural.
Dot Density Map Weaknesses
Of course the dot density map has its own failings. Perhaps most importantly, we can’t get numeric data from the map. Although it is possible to examine clusters of populations for any given parameter, calculating exactly how many people are in any given category is usually not possible.
By choosing to map population counts, we lose insight into the percentages. Do most households in a certain district have access to electricity? Using the choropleth, we can easily match a district’s color to the percentage given in the legend. By contrast, the dot density map is a poor choice for answering this kind of question.
Another downside is that the final appearance of the dot density map can depends on two subjective factors: dot value and dot size. What value should a single dot represent? 25,000 households or 50,000 households? We may have some methods for deciding this, but no definitive answer. Secondly, how large (either in pixels or a unit like meters, depending on your software) should each dot be? Both questions can have a large impact on the map’s appearance.
Also, there are other more technical considerations. For instance, how do you handle populations falling just below the dot value threshold? If a single dot represents 25,000 households and a district has 24,999 households in the given category, should the map assign a dot? Techniques like stochastic rounding that account for probability are useful in this situation. Compared to a choropleth, the dot density map may be more difficult for an average viewer to interpret.
With all of these concerns in mind, is there a way to map both percentage and raw count?
Proportional Symbols Maps
One method for mapping both the percentage and the raw total of a given variable is through a proportional symbols map, informally known as a bubble map. In this type of thematic map, we draw a symbol (usually a circle) from the center of the enumeration unit. We can assign the radius of the circle to reflect the raw total and the color to represent the percentage.
Compared to a choropleth, some of the geographic information of the map has been lost – district shapes are covered by circles. Nevertheless, the proportional symbols map shows enough geospatial information for many use cases. Without needing the exact geographic shapes, the circles can still reveal regional trends.
Read More: Several packages make it easy to create proportional symbols maps in R. See the
tm_bubbles() function of the
tmap package or the
propSymbolsChoroLayer() function of the
In our Shiny app, the colors of the circles communicate the percentage of households having access to electricity or a latrine – a fact I was unable to show in the dot density map. At the same time, the size of the circles reflects the raw count, solving the choropleth’s problem of concealing population totals.
The example below traces India’s household access to latrines from 1991 to 2011 on a proportional symbols map. It tells two stories simultaneously. First, through the increasing size of the circles, we see that, in raw terms, India’s population with access to a latrine has grown considerably since 1991. Second, the colors of the circles communicate a typical value for access to latrines in a particular district.
Note that the dot density map shows the first story, but not the second, whereas the choropleth tells the second story, but not the first.
To see this story for just one district, let’s look at Bangalore. Each decade, the size of the circle representing Bangalore has grown substantially, suggesting its population with access to a latrine has increased, in raw terms. At the same time, the color changes from the 70-80% band, to 80-90%, to 90-100%, suggesting that mean improvements have accompanied the increase in population.
Proportional Symbols Map Weaknesses
The proportional symbols map can flexibly handle many types of data, but a common problem is congestion when the number of enumeration units is high. With 640 districts in 2011, this is certainly a problem for this particular data set. If instead we were perhaps dealing with Indian states or large cities, this option might have been more effective.
When facing congestion, we often need to scale or transform the circles by some factor. Like choosing a dot value in the dot density map, this can also be somewhat arbitrary.
add_scatterplot() requires circle radius to be in meters, so I divided the raw household counts by a factor of 10. This outcome works better for some parameter selections than others.
Another problem is in the interpretation of circle sizes. A two-dimensional quantity such as area can be difficult to interpret accurately compared to a one-dimensional quantity such as length. Sometimes you will find a “graduated” symbols map where symbol size is binned to a few categories to make it easier to match a circle to the size that it represents. In this case, however, a legend for circle size is regrettably absent.
Do we have any other options to depict both a raw count and a percentage geographically?
One last option is a 3D choropleth, made possible by the elevation argument of
This visually-striking option manages to map three important quantities. It maps the percentage to color and the raw count to height, while maintaining the geographic shape of each enumeration unit. The choropleth could achieve only the first and third items; the proportional symbols map only the first two.
Below, we can see how the raw population of households with access to a latrine has grown (shown by the rising height of the district shapes), particularly from 2001 to 2011. The traditional choropleth fails to capture this population growth because it cannot communicate raw counts. At the same time, the colors of the district shapes communicate typical values in a way that the dot density map failed to do.
The example belows explores the distribution of India’s 2011 population with access to neither electricity nor a latrine. The low purple areas represent districts with small populations where most households have both electricity and a latrine.
Compared to the dot density map, the third dimension allows us to vividly see the concentration of India’s population without these key amenities in the states of Uttar Pradesh, Bihar, and West Bengal. Although the colors are the same as they would be in a flat choropleth, the height parameter adds an entirely new dimension to the story.
3D Choropleth Weaknesses
Not surprisingly though, the 3D choropleth suffers from a common dilemma for any kind of 3D visualization. It can be difficult to see in its entirety. The view is routinely blocked or obscured by other parts of the visualization. Without being able to rotate and tilt the visualization at will, and sometimes even then, it is very difficult to comprehend the details of the entire map. In contrast, the other thematic mapping options in this blog are perfectly viable as static maps.
mapdeck library and a single source of data, this post has attempted to highlight the strengths and weaknesses of the most common thematic maps, including choropleths, dot density maps, proportional symbols maps, and 3D choropleths.
Although this is far from an exhaustive list of thematic mapping options, hopefully it has introduced the idea of tradeoffs inherent in visualization choices depending on the type of data at hand and the story that you hope to communicate. At the same time, I hope it has helped to unearth some of the most important stories in the history of global development.
Read More: For more resources on geospatial data visualization, be sure to check out some of the links below.
- For more examples on the mechanics of mapmaking in R, please see my contributions in Lessons 3 and 4 of SocialCops’ free course, Introduction to GIS in R.
- Axis Maps has an excellent cartography guide covering principles of map design, pertaining to all of the thematic maps covered in this post.
- Claus Wilke’s chapter “Visualizing geospatial data” in his open-source book Fundamentals of Data Visualization introduces the idea of projections and choropleth mapping.
- Kieran Healy’s chapter on maps in his open source book Data Visualization: A practical introduction tests out a wide range of choropleths with a lot of in-text R code.
- Although not strictly geospatial, the gold standard of data storytelling in international development is the late Hans Rosling’s country comparison of life expectancy and income. His animated graphics inspired a number of contributors to recreate his visualizations with a variety of programming languages and tools.