diff --git a/figures/jpeg_example.xcf b/figures/jpeg_example.xcf new file mode 100644 index 00000000..6035e5cd Binary files /dev/null and b/figures/jpeg_example.xcf differ diff --git a/figures/jpeg_example1.jpg b/figures/jpeg_example1.jpg new file mode 100644 index 00000000..c44a1123 Binary files /dev/null and b/figures/jpeg_example1.jpg differ diff --git a/figures/jpeg_example2.jpg b/figures/jpeg_example2.jpg new file mode 100644 index 00000000..5065814d Binary files /dev/null and b/figures/jpeg_example2.jpg differ diff --git a/figures/jpeg_example3.jpg b/figures/jpeg_example3.jpg new file mode 100644 index 00000000..990711d4 Binary files /dev/null and b/figures/jpeg_example3.jpg differ diff --git a/figures/jpeg_example4.jpg b/figures/jpeg_example4.jpg new file mode 100644 index 00000000..14a8c95e Binary files /dev/null and b/figures/jpeg_example4.jpg differ diff --git a/figures/jpeg_example_combined.idraw b/figures/jpeg_example_combined.idraw new file mode 100644 index 00000000..a6a6220c Binary files /dev/null and b/figures/jpeg_example_combined.idraw differ diff --git a/image_file_formats.Rmd b/image_file_formats.Rmd index 10d06206..3bff9839 100644 --- a/image_file_formats.Rmd +++ b/image_file_formats.Rmd @@ -1,3 +1,23 @@ +```{r include=FALSE, cache=FALSE} +set.seed(7654) +options(digits = 3) + +knitr::opts_chunk$set( + echo = FALSE, + cache = FALSE, + fig.align = 'center', + fig.width = 6, + fig.asp = 0.618, # 1 / phi + fig.show = "hold" +) + +options(dplyr.print_min = 6, dplyr.print_max = 6) + +library(cowplot) + +source("R/misc.R") +source("R/themes.R") +``` # Understanding commonly used image file formats Anybody who is making figures for data visualization will eventually have to know a few things about how figures are stored on the computer. There are many different image file formats, and each has its own set of benefits and disadvantages. @@ -31,9 +51,15 @@ raw Raw Image File bitmap digital photography, ne gif Graphics Interchange Format bitmap outdated, do not use -## Lossy and lossless bitmap graphics +## Lossless and lossy compression of bitmap graphics + +Most bitmap file formats employ some form of data compression to keep file sizes manageable. There are two fundamental types of compression: lossless and lossy. Lossless compression guarantees that the compressed image is pixel-for-pixel identical to the original image, whereas lossy compression accepts some image degradation in return for smaller file sizes. + +To understand which approach is appropriate when, it is helpful to have a basic understanding of how these different compression algorithms work. Let's first consider lossless compression. Imagine an image with a black background, where large areas of the image are solid black and thus many black pixels appear right next to each other. Each black pixel can be represented by three zeroes in a row: 0 0 0, representing zero intensities in the red, green, and blue color channels of the image. The areas of black background in the image correspond to thousands of zeros in the image file. Now assume somewhere in the image are 1000 consecutive black pixels, corresponding to 3000 zeros. Instead of writing out all these zeros, we could store simply the total number of zeros we need, e.g. by writing 3000 0. In this way, we have conveyed the exact same information with only two numbers, the count (here, 3000) and the value (here, 0). Over the years, many clever tricks along these lines have been developed, and modern lossless image formats (such as png) can store bitmap data with impressive efficiency. However, all lossless compression algorithms perform best when images have large areas of uniform color, and therefore Table \@ref(tab:file-formats) lists png as optimized for line drawings. + +Photographic images rarely have multiple pixels of identical color and brightness right next to each other. Instead they have gradients and other somewhat regular patterns on many different scales. Therefore, lossless compression of these images often doesn't work very well, and lossy compression has been developed as an alternative. The key idea of lossy compression is that some details in an image are too subtle for the human eye, and those can be discarded without obvious degradation in the image quality. For example, consider a gradient of 1000 pixels, each with a slightly different color value. Chances are the gradient will look nearly the same if it is drawn with only 200 different colors and coloring every five adjacent pixels in the exact same color. -For example, in the widely-used RGB format, a black pixel can be represented by three zeroes in a row: 0 0 0. Now imagine an image with a black background, where large areas of the image are solid black and thus many black pixels appear right next to each other. You can imagine that storing this image information could require thousands of consecutive zeroes. If however, instead of writing out all the zeroes, we store the number of consecutive zeroes we need, then we have conveyed the exact same information with only two numbers, the count (e.g., 3000) and the value (here, 0). This is the key idea behind lossless compression of image data. We store the exact color of each individual pixel, but we do so in a space-efficient manner. +The most widely used lossy image format is jpeg (Table \@ref(tab:file-formats)), and indeed many digital cameras output images as jpeg by default. ## Image resolutions, dots per inch