Quantitative Symbology Break Comparison Exercise
- We will be visualizing block-level population counts for Allegheny county--we used this data last week when we practiced joins and buffers to determine populations along a bus route. Add the following layers: alleghenyCountyBlocks and ACblockPop table.
- Join the population count table to the block level polygons using the GEOIDString Field in the table (we are joining String formatted columns)
- Now you've got block polygons with populations which we can now tinker with. Open up the layer properties >> Symbology. Navigate into "graduated colors" under "quantities". We have to tweak a little thing: Click Classify then click the "sampling" button and change the sample size to any number over 30,519 (the number of blocks). Apply these settings.
- Now go back into the classification method dialog box. Take a moment and review what the data looks like. Now investigate the thought questions below.
We care about how we symbolize data on maps becacuse not each method of visualization conveys the same types of information equally well. To choose an appropriate break pattern, we need to first identify the use case for the map. The use case will then determine what kinds of information we want to make sure to preserve (even draw out) or throw out as extra.
Example: Let's imagine we work for a retail company that markets budget take-and-bake meals. The team is trying to find a great place for their new store. We'll assume each person is equally likely to buy the product (a strong assumption) so all we need to think about is population in the blocks. The data we want to isolate is concentrated population, and we don't care much about outliers.
Guided Exploration Questions
- Look at the histogram for this data set. How would you describe its shape? Is it lopsided or a nice normal curve?
- What is the mean, median, mix, max, and standard deviation? (Check out wikipedia explanation of SD)
- Let's imagine that we are transportation route planners and we want to look for routes with lots of people in Allegheny county as a whole. PAUSE: What kind of variation is such a person interested in seeing?
- Given this--which break approach would be most ideal for this use case? Why? How many breaks? Create an effective visualization for this use case.
- NOW Let's imagine we are working for the department of health. A major flu epidemic has broken out and we don't have a vaccine that is very effective. Flu is spread more rapily when folks are packed together--densely populated. What kind of variation are we interested in visualizing in the data? What do we want to be able to determine?
- Given this flu scenario, tinker with the break types and normalization. What configuration makes the most sense? Why? Make a symbolized map that shows an effective break pattern for this use csae.
- What principles of Symbology design can we agree on given this exercise?