Lab 5 - Part D Bar Plots & Dot Plots Bar and dot plots are used in two ways: (1) to display proportions of categories, and (2) to compare class means, i.e. experimental treatments or sampling sites. In either case, you don’t have a sensible scale on your x-axis, i.e. your independent variable is a factor, such as “Control”, “Nitrogen”, “Phosphorus”, “Nitrogen & Phosporous” fertilization. Because the bars/dots can be grouped, these graph types work well for factorial experiments with multiple treatments (or for hierarchical sampling designs). 5.15. Bar charts for factorial designs We are already familiar with a dataset from a factorial experiment, our lentil data from Lab 2 on data tables and data management. Let’s set up as usual with a new folder, an empty workspace shortcut to start R. Then load, check, and attach the dataset. Write a three-liner that (1) re-sets your graphics parameters, (2) sets your window size to 6” wide and 4” high, and (3) sets the global graphics parameter to your preferences (size, font-size, font-type). Remember to run and re-run each graphic starting with the graphics.off() command. We are going to use a very handy package with scientific graphing functions for factorial designs. Go ahead and install the package sciplot (you recall how we installed extension packages in previous labs, right? Otherwise ask the TA and myself), and try the following code: library(sciplot) par(mfrow=c(2, 2)) bargraph.CI(x.factor=FARM, response=YIELD) bargraph.CI(x.factor=VARIETY, response=YIELD) bargraph.CI(x.factor=VARIETY, group=FARM, response=YIELD, legend=T) bargraph.CI(x.factor=FARM, group=VARIETY, response=YIELD, legend=T) You see there are multiple ways to break and group your results. I personally like the last version best, and we can go ahead and customize it a bit. Play with the options (highlighted in bold) to see what they do! bargraph.CI(x.factor=FARM, group=VARIETY, response=YIELD, legend=T, xlab="Location", ylab="Yield (kg/ha)", ylim=c(0,800), cex.names=1, cex.lab=1, x.leg=6, y.leg=700, cex.leg=1, leg.lab=c("Var A","Var B","Var C"), col=grey.colors(3), uc=T, lc=T, err.width=0.1, err.col="black", err.lty=1) You can also do cross-hatching instead of colors. Try this for an old-fashioned look (add this to your plot code, this will not run by itself): col="black", density=c(20,10,0), angle=c(45,-45,0), By default, the error bars represent the standard error of the mean, but we can modify this with the following additional customization. The first function will calculate the standard deviation instead, and the second function will replace the default standard error with a 90% confidence interval (basically the standard error multiplied by 1.96). We will soon learn the statistical theory behind this, so you can program any confidence interval. Again, add this to your plot code (one line at the time). This will not run by itself: ci.fun=function(x) {c(mean(x)-sd(x), mean(x)+sd(x))} ci.fun=function(x) {c(mean(x)-1.96*se(x), mean(x)+1.96*se(x))} See if you can create this classic bar chart with standard deviations indicated only above the mean: 5.16. Line plots for factorial designs If your treatments are in an ordered sequence, e.g. “Control”, “1 x Nitrogen”, “2 x Nitrogen”, “4 x Nitrogen”, you may rather want to use a line graph. Multi-factor experiments can then be represented by different lines (e.g. one line for “Farm 1” and one line for “Farm 2”, or whatever is your second treatment). For illustration, I have generated a second dataset that is structured exactly like the farm dataset, but here we don’t have a class variable as treatment as before (genetic varieties). Instead we have an ordered sequence of nitrogen fertilizer levels. The most appropriate graph would be a line plot with standard errors, which works exactly like the barplot.CI code above: Import the dataset “fertilizer.csv”, which is downloadable from the website, run the code below, and try to customize the plot to your liking (for more customization options run: ?lineplot.CI). See if you can create the plot below the code. library(sciplot) lineplot.CI(x.factor=NITROGEN, group=FARM, response=YIELD, col=c("black","gray"), pch=c(1,16), cex=1.5, lty=c(1,2), xlab = "Nitrogen treatment (t/ha)", ylab="Yield (kg/ha)", legend=T, x.leg=0.9, y.leg=27) 5.17. Dot charts Dot charts can be used as an alternative to bar charts. They are generally easier to read when you have many treatments, due to their high data-to-ink ratio. Also, they do not need to start with a 0 value to convey the correct sense of the treatment effect (a problem with bar charts). Dot plots are the most highly ranked graph type for perceptual accuracy (see Nature artcle in reading material: Fig 1c, right and Table 1, rank 1). However, bar charts are still a very familiar chart type and are widely used despite their relatively low information density. Do use them if you only have a small number of treatment factors. Unfortunately, the dot plot function does not automatically calculate summary statistics, so we have to do that ourselves. Use Excel or PLYR to summarize the “fertilizer.csv” data: library(plyr) dat3 = ddply(dat,.(FARM,NITROGEN), summarise, X=mean(YIELD), N=length(YIELD), SE=sd(YIELD)/sqrt(length(YIELD))) head(dat3) attach(dat3) Now let’s try some dot plots. We use the dotplot2 function of the package Hmisc, which you have to install first. As with bar plots, there are different ways to group things: graphics.off() windows(width=4, height=4) par (cex=1, family="sans", mar=c(5,5,5,5)) library(Hmisc) dotchart2(X, FARM, groups=NITROGEN) dotchart2(X, NITROGEN, groups=FARM) As before, you can fully customize your graph with a number of familiar and graph-specific options that you can look-up via ?dotchart2. Play with the options (bold) to remind yourself what they do: dotchart2(X, NITROGEN, groups=FARM, xlab="Yield (kg/h)", xlim=c(5,30), cex.labels=1, cex.group.labels=1, groupfont=3, width.factor=1.8, lty=2, lcolor="black", pch=21, col="black", bg="gray", dotsize=1.5, auxdata=N, auxtitle="N", sort.=F) You can indicate standard errors by adding symbols (note that you have to include the “add and reset parameter options for this to work properly): dotchart2(X+SE, NITROGEN, groups=FARM, pch="|", add=T, reset.par=F) dotchart2(X-SE, NITROGEN, groups=FARM, pch="|", add=T, reset.par=F) Alternatively, you can use the same principle to add additional data. For illustration, I subset the dat3 dataset into Farm1 [1-3] and Farm2 [4-6] and add Farm2 data to the original Farm1 plot. See if you can write additional lines to add and color the standard errors: dotchart2(X[1:3], NITROGEN[1:3], xlab="Yield (kg/h)", xlim=c(5,30), width.factor=1.8, lty=2, lcolor="black", pch=21, col="black", bg="red", dotsize=1.5) dotchart2(X[4:6], NITROGEN[4:6], add=TRUE, reset.par=F, pch=21, col="black", bg="blue", dotsize=1.5) text(8,2.8,"Farm 1", col="red") text(14,2.8,"Farm 2", col="blue")