Friday, July 27, 2012

ggplot2: A little twist on back-to-back bar charts

Sangyoon Lee

Background

While thinking about ways to represent incoming and outgoing flows in a business process, I thought about using export-import charts like the one shown here in the Learning R blog. However, as the author acknowledges, it is difficult to compare individual values using these charts. Regardless, I still wanted to have this graph for an at-a-glance view before breaking it into facets and comparing individual values.

My Solution

In the Learning R article, the author chooses to show multiple categories of import and export using stacked bars. Instead of representing multiple categories, I decided to use the color intensity on the bars’ fill as visual reinforcement of information the graph already contains. Import and export are represented by red and blue, respectively, and the transparency facilitates the visual comparison the reader must make between bars that are not side by side.

In the below example, I use the same subset of data as in the motivating post. Please refer to the linked article for the data used in this example. Make sure to click on the link that says "Access the subset used in this post in here." rather than going to the Eurostat website. Save the file as "trade.csv" in the working directory. These are monthly trade data for the 27 European Union countries by broad economic categories (BEC) in millions of Euros.

First, load the necessary packages.


For convenient and powerful data manipulation, plyr and reshape provide functions like ddply and melt. A relatively new package, scales is required for scale functions to format the numbers to specific scales within ggplot2.

Next, import the data, calculate the trade balance (export - import), and melt the data for ggplot2.

trade <- read.csv("trade.csv", header = TRUE, 
   stringsAsFactors = FALSE)
balance <- ddply(trade, .(Time), summarise, 
   balance = sum(EXP - IMP))
trade.m <- melt(trade, id.vars = c("BEC","Time"))

After the melt step, add another line to aggregate over BEC. This will further simplify the structure.
trade.a <- ddply(trade.m, c("Time", "variable"), summarise,
                 value = sum(value))
At this point, the data will look like this:

> head(trade.a)
     Time variable    value
1 2008M05      EXP 273153.2
2 2008M05      IMP 260789.1
3 2008M06      EXP 284994.7
4 2008M06      IMP 273033.0
5 2008M07      EXP 284681.6
6 2008M07      IMP 271122.2

We step through one layer at a time.

Layer 1: Start with export bars. We will add import data on the bottom of this graph.

ggplot(trade.a, aes(x=Time)) + 
   geom_bar(data = subset(trade.a, variable == "EXP"),
   aes(y=value, fill = value), stat = "identity")

Layer 2: Add the import data and attach it back-to-back to the export data. Label the x-axis and the y-axis accordingly.

last_plot() + geom_bar(data = subset(trade.a, variable == "IMP"), 
  aes(y=-value, fill = -value), stat = 'identity') +
  scale_y_continuous(labels = comma) + xlab("") + 
  ylab("Export  -  Import") + 
  scale_fill_gradient2(low = muted("red"),
  mid = "white", high = muted("blue"), midpoint = 0,space = "rgb")

Layer 3. Now add the balance trend line, remove the meaningless legend, and format the y-axis with commas. 

last_plot() + geom_line(data = balance, aes(Time, balance, group = 1), size = 1) + geom_hline(yintercept = 0,colour = "grey90") + opts(legend.position = "none") 

Layer 4: Finally, change the x-axis to make it easy for viewers to read. The following result is my final product.
labels <- gsub("20([0-9]{2})M([0-9]{2})", "\\2\n\\1",trade.m$Time)
last_plot() + scale_x_discrete(labels = labels)

The resulting plot shows the overall export and import trend, with different color intensities to reinforce the size of each bar. This eases the cognitive burden placed on readers when they visually compare export versus import.

While the overall trend shows that there are more exports than imports, the story might be more complicated when there are subcategories. An example is the United States economy: an aggregated USA import-export chart will show significantly larger import bars than exports bars, but when it is broken into different categories, especially in agricultural goods, the graph will show a different story from the overall trend. 

In the meantime, this graph provides a quick at-a-glance look at exports and imports before digging deeper into various categories for further analysis. 

All highlighted R-code segments were created by Pretty R at inside-R.org. Keep up with ours and other great articles relating to R on R-bloggers.

References

2 comments:

  1. Great post :) Thanks

    ReplyDelete
  2. Thanks, your example came just I was trying to figure out how to construct an age-sex pyramid to display age distribution by sex. Build a back-to-back chart, flip the coordinates, and voila, a population pyramid!

    ReplyDelete