Skip to content

Introduction to the ggplot2 Package

When working with ecological data, visualization is crucial for understanding patterns and communicating results. The ggplot2 package provides a powerful and flexible way to create publication-quality plots in R. The “gg” in ggplot2 stands for “Grammar of Graphics,” which means it follows a systematic approach to building plots layer by layer.

The Grammar of Graphics

Think of creating a plot like building with LEGO blocks. Each piece (or layer) builds upon the previous one to create the final structure. In ggplot2, we build plots using seven fundamental layers:

The seven layers of ggplot2: data, aesthetics, geometries, facets, statistics, coordinates, and theme
  • Data: The dataset you want to visualize
  • Aesthetics: How your data maps to visual properties (x and y positions, colors, sizes, etc.)
  • Geometries: The type of plot you want to create (points, lines, bars, etc.)
  • Facets: How to split your plot into multiple panels
  • Statistics: Statistical transformations of your data
  • Coordinates: The coordinate system for your plot
  • Theme: The overall appearance of your plot

For now, we’ll focus on the first three layers (data, aesthetics, and geometries) to create a basic plot. We’ll explore the other layers in more detail in the next section.

The + Operator in ggplot2

Unlike other tidyverse functions that use the pipe operator (%>%), ggplot2 uses the + operator to add layers to your plot. This is because we’re not transforming data (like with pipes), but rather building up a plot by adding visual elements:

# Adding multiple layers with +
ggplot(data = df_vert) +
  geom_point(aes(x = length_1_mm, y = weight_g)) +  # First layer
  geom_smooth(aes(x = length_1_mm, y = weight_g))   # Second layer

Each + adds a new layer to your plot, building it up piece by piece. This is different from the pipe operator, which passes data from one function to another.

Creating Your First Plot

Let’s create a simple scatter plot using data about vertebrates from the Andrews Forest Long Term Ecological Research site. First, we need to load our required packages and data:

# Load required packages
library(tidyverse) # Automatically loads ggplot2
library(lterdatasampler) # Contains the data we'll use

# Load the data
df_vert <- lterdatasampler::and_vertebrates

# Look at the structure of our data
str(df_vert)
tibble [32,209 × 16] (S3: tbl_df/tbl/data.frame)
 $ year       : num [1:32209] 1987 1987 1987 1987 1987 ...
 $ sitecode   : chr [1:32209] "MACKCC-L" "MACKCC-L" "MACKCC-L" "MACKCC-L" ...
 $ section    : chr [1:32209] "CC" "CC" "CC" "CC" ...
 $ reach      : chr [1:32209] "L" "L" "L" "L" ...
 $ pass       : num [1:32209] 1 1 1 1 1 1 1 1 1 1 ...
 $ unitnum    : num [1:32209] 1 1 1 1 1 1 1 1 1 1 ...
 $ unittype   : chr [1:32209] "R" "R" "R" "R" ...
 $ vert_index : num [1:32209] 1 2 3 4 5 6 7 8 9 10 ...
 $ pitnumber  : num [1:32209] NA NA NA NA NA NA NA NA NA NA ...
 $ species    : chr [1:32209] "Cutthroat trout" "Cutthroat trout" "Cutthroat trout" "Cutthroat trout" ...
 $ length_1_mm: num [1:32209] 58 61 89 58 93 86 107 131 103 117 ...
 $ length_2_mm: num [1:32209] NA NA NA NA NA NA NA NA NA NA ...
 $ weight_g   : num [1:32209] 1.75 1.95 5.6 2.15 6.9 5.9 10.5 20.6 9.55 13 ...
 $ clip       : chr [1:32209] "NONE" "NONE" "NONE" "NONE" ...
 $ sampledate : Date[1:32209], format: "1987-10-07" "1987-10-07" "1987-10-07" "1987-10-07" ...
 $ notes      : chr [1:32209] NA NA NA NA ...

Now, let’s create a scatter plot showing the relationship between fish length and weight. We’ll build this plot layer by layer:

# Create the plot
ggplot() +  # Start with an empty plot
  geom_point(  # Add points
    data = df_vert,  # Specify the data
    aes(  # Define the aesthetics
      x = length_1_mm,  # Map length to x-axis
      y = weight_g  # Map weight to y-axis
    )
  )
Basic scatter plot of fish length vs weight

Let’s break down what we just did:

  • ggplot(): Creates a new plot object
  • geom_point(): Adds points to the plot (other options include geom_line(), geom_bar(), etc.)
  • data = df_vert: Specifies which dataset to use
  • aes(): Defines how variables in the data map to visual properties

Two Ways to Specify Data

There are two ways to specify which data to use in your plot:

  • Global data: Specify the data once in the ggplot() function. This data will be used by all layers unless overridden:
# Using global data
ggplot(data = df_vert) +  # Data specified here
  geom_point(aes(x = length_1_mm, y = weight_g))  # No need to specify data again
  • Layer-specific data: Specify the data in each layer. This is useful when you want to use different datasets in different layers:
# Using layer-specific data
ggplot() +  # No global data
  geom_point(
    data = df_vert,  # Data specified here
    aes(x = length_1_mm, y = weight_g)
  )

Key Points to Remember

  • ggplot2 uses a layered approach to building plots
  • Each layer adds a new element to your plot
  • Layers are added using the + operator (not the pipe %>%)
  • You can specify data globally in ggplot() or locally in each layer
  • The basic structure is: ggplot() + geom_*()

In the next section, we’ll explore how to make this basic plot more informative and visually appealing by adding more layers and customizing its appearance.