Introduction to the ggplot2 Package
One way to plot in R, is to use the ggplot2
package in the Tidyverse. “gg” stands for the “Grammar of Graphics” and the package makes for a convenient and consistent way to make almost any plot you want. Typically it’s easiest to plot data in ggplot2
using a dataframe, which means we usually don’t need to make major changes to our data if we decide we want to change how we’re plotting something in particular.
Components of a Good Plot
The function used to make a plot is ggplot()
. This comes from the ggplot2
package. As a reminder, first using the ggplot2
package we have to install it (just the first time we use it!) with install.packages("ggplot2")
and then for every time after that we want it, we don’t need to re-install it, we just need to call the package at the beginning of our script with library(ggplot2)
.
There is a consistent template that we’ll need to use to get our plots to work:
ggplot(data = <DATA>,
mapping = aes(<MAPPINGS>)) +
<GEOM_FUNCTION>()
Here we have three main components:
* the data call – this will almost always refer to a dataframe
* the aesthetic mappings – these are the variables we’re plotting, and the specifications of how we want them to be displayed
* the geom function – each type of plot has it’s own geom function (i.e. geom_point()
for a scatterplot, geom_line()
to plot timeseries, etc.)
A small point, note here that unlike other Tidyverse packages, we do not use the pipe %>%
to link different functions, we use an addition operator +
.
By way of a quick example, we’ll make a super quick and dirty scatterplot of some long-term ecological data on salamanders and trout.
library(tidyverse)
library(lterdatasampler)
df <- lterdatasampler::and_vertebrates
names(df)
## [1] "year" "sitecode" "section" "reach"
## [5] "pass" "unitnum" "unittype" "vert_index"
## [9] "pitnumber" "species" "length_1_mm" "length_2_mm"
## [13] "weight_g" "clip" "sampledate" "notes"
The most basic version of the plot we may want to make is a scatter plot of two continuous variables, let’s say length and weight:
ggplot() +
geom_point(data = df, aes(x = length_1_mm, y = weight_g))
Here you notice we place the aesthetics arguments (mapping = aes()
) in each geom
argument individually. This will allow us to plot using multiple dataframes or variables.
In the Iterative Plotting section we’ll go over how to add components to go from an ugly plot like this to a publication-ready graphic.