subsection on memorable figures. Closes clauswilke#48.

gedonis · Nov 12, 2018 · 7365830 · 7365830
1 parent 04dd78e
commit 7365830
Show file tree

Hide file tree

Showing 3 changed files with 76 additions and 14 deletions.
diff --git a/bibliography.bib b/bibliography.bib
@@ -214,3 +214,31 @@ @misc{UCI_repo_2017
   url = {https://archive.ics.uci.edu/ml},
   institution = {University of California, Irvine, School of Information and Computer Sciences}
 }
+
+@article{Bateman_et_al_2010,
+  author = {Bateman, S. and Mandryk, R. and Gutwin, C. and Genest, A. and McDine, D. and Brooks, C.},
+  title = {Useful Junk? The Effects of Visual Embellishment on Comprehension and Memorability of Charts},
+  journal = {ACM Conference on Human Factors in Computing Systems},
+  year = 2010,
+  pages = {2573--2582},
+  doi = {10.1145/1753326.1753716}
+}
+
+@article{Borgo_et_al_2012,
+  author = {Borgo, R. and Abdul-Rahman, A. and Mohamed, F. and Grant, P. W. and Reppa, I. and Floridi, L.},
+  title = {An Empirical Study on Using Visual Embellishments in Visualization},
+  journal = {IEEE Transactions on Visualization and Computer Graphics},
+  volume = 18,
+  year = 2012,
+  pages = {2759--2768},
+  doi = {10.1109/TVCG.2012.197}
+}
+
+@article{Haroz_et_al_2015,
+  author = {S. Haroz and R. Kosara and S. L. Franconeri},
+  title = {{ISOTYPE} Visualization: Working Memory, Performance, and Engagement with Pictographs},
+  journal = {ACM Conference on Human Factors in Computing Systems},
+  year = 2015,
+  pages = {1191--1200},
+  doi = {10.1145/2702123.2702275}
+}
diff --git a/geospatial_data.Rmd b/geospatial_data.Rmd
@@ -105,16 +105,32 @@ ggplot(world_sf) +
   )
 ```
 
-```{r usa-orthographic, fig.width = 4.5, fig.asp = 1}
+48 of the 50 states (also referred to as the "lower 48") are contiguous and easy to visualize at once. But two states (Alaska and Hawaii) are located a substantial distance away from the lower 48 states (Figure \@ref(fig:usa-orthographic)).
+
+
+(ref:usa-orthographic) Locations of Alaska, Hawaii, and the lower 48 states shown on a globe.
+
+```{r usa-orthographic, fig.width = 5.5, fig.asp = 1, fig.cap = '(ref:usa-orthographic)'}
 cenlat <- 35
 cenlong <- -130
 
 draw_ocean(cenlat, cenlong, lwd = 0.25)
-draw_land(map_polys$usa, cenlat, cenlong, col = "#FF0000B0") 
-draw_land(map_polys$world_no_usa, cenlat, cenlong, col = "#C0C0C0B0") 
+draw_land(map_polys$usa, cenlat, cenlong, col = "#D00000D0") 
+draw_land(map_polys$world_no_usa, cenlat, cenlong, col = "#C0C0C0B0")
+par(family = dviz_font_family_condensed, ps = 12)
+text(
+#  x = c(0.38, 0.05, -0.4),
+#  y = c(0.15, 0.49, -0.1),
+  x = c(0.36, -0.17, -0.4),
+  y = c(0.13, 0.49, -0.1),
+  labels = c("lower 48", "Alaska", "Hawaii"),
+  col = c("white", "white", "black")
+)
 ```
 
-```{r usa-true-albers}
+(ref:usa-true-albers) Locations of Alaska, Hawaii, and the lower 48 states shown with an area-preserving Albers projection  (ESRI:102003, commonly used to project the lower 48 states).
+
+```{r usa-true-albers, fig.cap = '(ref:usa-true-albers)'}
 longs <- -180:-20
 lats <- rep(89.9, length(longs))
 earth_boundary <- sf::st_sfc(
@@ -137,8 +153,8 @@ ggplot(us_states_geoms$true_albers) +
     size = 0.5/.pt
   ) +
   coord_sf(xlim = c(-6721002, 2685733), ylim = c(-1634610, 4888053), expand = FALSE, ndiscr = 1000) +
-  scale_x_continuous(breaks = -20*c(3:10)) +
-  scale_y_continuous(breaks = (1:9)*10) +
+  scale_x_continuous(name = NULL, breaks = -20*c(3:10)) +
+  scale_y_continuous(name = NULL, breaks = (1:9)*10) +
   theme_dviz_grid(font_size = 12, rel_small = 1) +
   theme(
     panel.background = element_rect(fill = "#56B4E950"),

diff --git a/telling_a_story.Rmd b/telling_a_story.Rmd
@@ -148,9 +148,12 @@ flights_grouped %>%
     scale_y_continuous(expand = c(0, 0), name = "mean arrival delay (min.)") +
     scale_x_discrete(name = NULL) +
     geom_col() + 
-    coord_flip() +
+    coord_flip(clip = "off") +
     theme_dviz_vgrid(rel_small = 1) +
-    theme(axis.ticks.y = element_blank())
+    theme(
+      axis.line.y = element_blank(),
+      axis.ticks.y = element_blank()
+    )
 ```
 
 (ref:number-of-flights-nyc) Number of flights out of the New York City area in 2013, by airline. Delta and American are fourth and fifths largest carrier by flights out of the New York City area.  Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics.
@@ -164,9 +167,12 @@ flights_grouped %>%
     scale_y_continuous(expand = c(0, 0), name = "number of flights") +
     scale_x_discrete(name = NULL) +
     geom_col() + 
-    coord_flip() +
+    coord_flip(clip = "off") +
     theme_dviz_vgrid(rel_small = 1) +
-    theme(axis.ticks.y = element_blank())
+    theme(
+      axis.line.y = element_blank(),
+      axis.ticks.y = element_blank()
+    )
 ```
 
 
@@ -235,6 +241,9 @@ ggplot(flights_grouped, aes(x = wday(time_hour, label = TRUE, week_start = 1)))
 
 ## Make your figures memorable
 
+Simple and clean figures such as simple bar plots have the advantage that they avoid distractions, are easy to read, and let your audience focus on the most important points you want to bring across. However, the simplicity can come with a disadvantage: Figures can end up looking generic. They don't have any features that stand out and make them memorable. If I showed you ten bargraphs in quick succession you'd have a hard time keeping them apart and afterwards remembering what they showed. For example, if you take a quick look at Figure \@ref(fig:petownership-bar), you will notice the visual similarity to Figure \@ref(fig:number-of-flights-nyc), which I discussed earlier in this chapter. However, the two figures have nothing in common other than they are bar charts. Figure \@ref(fig:number-of-flights-nyc) showed the number of flights out of the New York City area by airline, whereas Figure \@ref(fig:petownership-bar) shows the most popular pets in U.S. households. Neither figure has any element that helps you intuitively perceive what topic the figure covers, and therefore neither figure is particularly memorable.
+
+(ref:petownership-bar) Number of households having one or more of the most popular pets: dogs, cats, fish, or birds. This bar graph is perfectly clear but not necessarily particularly memorable. The "cats" column has been highlighted solely to create visual similarity with Figure \@ref(fig:number-of-flights-nyc). Data source: 2012 U.S. Pet Ownership & Demographics Sourcebook, American Veterinary Medical Association
 
 ```{r petownership-bar, fig.asp = .5, fig.cap = '(ref:petownership-bar)'}
 
@@ -255,10 +264,11 @@ poultry	1020000
 
 
 df$pet <- factor(df$pet, levels = rev(df$pet))
-df <- dplyr::filter(df, households > 3500000)
+df <- filter(df, households > 3500000) %>%
+  mutate(highlight = ifelse(pet == "cats", "yes", "no"))
 
-ggplot(df, aes(x = pet, y = households)) +
-  geom_col(alpha = 0.8, fill = "#0072B2", color = NA) +
+ggplot(df, aes(x = pet, y = households, fill = highlight)) +
+  geom_col() +
   geom_label(
     aes(label = paste0(signif(households*1e-6, 2), "M")),
     hjust = 0,
@@ -281,6 +291,7 @@ ggplot(df, aes(x = pet, y = households)) +
     #position = "right",
     expand = c(0, 0)
   ) +
+  scale_fill_manual(values = c("#B0B0B0D0", "#BD3828D0"), guide = "none") +
   theme_dviz_vgrid(font_size = 12, rel_small = 1) +
   theme(
     axis.line = element_blank(),
@@ -290,7 +301,11 @@ ggplot(df, aes(x = pet, y = households)) +
 
 ```
 
-If I showed you ten bargraphs in quick succession you'd have a hard time keeping them apart, and afterwards remembering what they showed.
+Research on human perception shows that more visually complex and unique figures are more memorable [@Bateman_et_al_2010; @Borgo_et_al_2012]. However, memorability alone is not that useful for a data visualization. At the extreme, a figure could be highly memorable but utterly confusing. Such a figure would not be a good data visualization, even if it works well as a stunning piece of art. At the other extreme, figures may be very clear but forgettable and boring, and those figures may not have the impact we might hope for either. In general, we want to strike a balance between the two extremes and make our figures both memorable and clear. (The intended audience matters as well, however. If a figure is intended for a technical scientific publication, we will generally worry less about memorability than if the figure is intended for a broadly read newspaper or blog.)
+
+We can make a figure more memorable by adding visual elements that reflect features of the data, for example drawings or pictograms of the things or objects that the dataset is about. One approach that is commonly taken is to show the data values itself in the form of repeated images, such that each copy of an image corresponds to a defined amount of the represented variable. For example, we can replace the bars in Figure \@ref(fig:petownership-bar) with repeated images of a dog, a cat, a fish, and a bird, drawn to a scale such that each complete animal corresponds to five million housholds (Figure \@ref(fig:petownership-isotype)). Thus, visually, Figure \@ref(fig:petownership-isotype) still functions as a bar plot, but we now have added some visual complexity that makes the figure more memorable, and we have also shown the data using images that directly reflect what the data mean. After only a quick glance at the figure, you may be able to remember that there were many more dogs and cats than fish or birds. Importantly, in such visualizations, we want to use the images to represent the data, rather than using images simply to adorn the visualization or to annotate the axes. In psychological experiments, the latter choices tend to be distracting rather than helpful [@Haroz_et_al_2015].
+
+(ref:petownership-isotype) Number of households having one or more of the most popular pets, shown as an isotype graph. Each complete animal represents 5 million households who have that kind of pet. Data source: 2012 U.S. Pet Ownership & Demographics Sourcebook, American Veterinary Medical Association
 
 ```{r petownership-isotype, fig.asp = .5, fig.cap = '(ref:petownership-isotype)'}
 
@@ -337,6 +352,9 @@ ggplot(df, aes(x = pet, y = households, image = pet)) +
 
 ```
 
+Visualizations such as Figure \@ref(fig:petownership-isotype) are often called isotype plots. The word isotype was introduced as an acronym of International System Of TYpographic Picture Education, and strictly speaking it refers to logo-like simplified pictograms that represent objects, animals, plants, or people [@Haroz_et_al_2015]. However, I think it makes sense to use the term isotype plot more broadly to apply to any type of visualization where repeated copies of the same image are used to indicate the magnitude of a value. After all, the prefix "iso" means "the same" and "type" can mean a particular kind, class, or group.
+
+
 ## Be consistent but don't be repetitive
 
 When discussing compound figures in Chapter \@ref(compound-figures), I mentioned that it is important to use a consistent visual language for the different parts of a larger figure. The same is true across figures. If we make three figures that are all part of one larger story, then we need to design those figures so they look like they belong together. Using a consistent visual language does not mean, however, that everything should look exactly the same. On the contrary. It is important that figures describing different analyses look visually distinct, so that your audience can easily recognize where one analysis ends and another one starts. This is best achieved by using different visualization approaches for different parts of the overarching story. If you have used a bar plot already, next use a scatterplot, or a boxplot, or a line plot. Otherwise, the different analyses will blur together in your audience's mind, and they will have a hard time distinguishing one part of the story from another. For example, if we re-design Figure \@ref(fig:athletes-composite-good) from Chapter \@ref(compound-figures) so it uses only bar plots, the result is noticeable less distinct and more confusing (Figure \@ref(fig:athletes-composite-repetitive)).