Skip to content

Geographical Mapping

Spatial visualization is fundamental in ecology and evolutionary biology for understanding species distributions, habitat preferences, and spatial patterns in biological data. This article explores techniques for creating geographical maps using R, demonstrating these concepts with a dataset on American pika populations from Niwot Ridge, Colorado.

Introduction to Geospatial Plotting

Geospatial visualization is essential in ecology and evolutionary biology for several reasons:

  • Understanding species distributions and habitat preferences
  • Identifying spatial patterns in biological data
  • Visualizing environmental gradients
  • Communicating field study locations and sampling designs
  • Analyzing spatial relationships between different variables

Creating effective geographical visualizations requires understanding several key concepts:

  • Coordinate systems and their transformations
  • Map projections and their properties
  • Spatial data structures
  • Integration of different data types (points, polygons, rasters)

The Dataset: Niwot Ridge American Pika

We’ll be working with data from a study of American pikas (Ochotona princeps) at Niwot Ridge, Colorado. The dataset includes stress hormone measurements from pika fecal samples collected between June and September 2018, along with spatial information about sampling locations.

Let’s load the required packages and examine the data:

library(tidyverse)
library(lterdatasampler) # for the data
library(ggmap)           # for map-based visualization
library(sf)              # for spatial data handling
library(viridis)         # for color palettes

# Load the data
df_pikas <- lterdatasampler::nwt_pikas
str(df_pikas)
tibble [109 × 8] (S3: tbl_df/tbl/data.frame)
 $ date              : Date[1:109], format: "2018-06-08" "2018-06-08" "2018-06-08" "2018-06-13" ...
 $ site              : Factor w/ 3 levels "Cable Gate","Long Lake",..: 1 1 1 3 3 3 3 3 3 3 ...
 $ station           : Factor w/ 20 levels "Cable Gate 1",..: 1 2 3 14 15 16 17 18 19 9 ...
 $ utm_easting       : num [1:109] 451373 451411 451462 449317 449342 ...
 $ utm_northing      : num [1:109] 4432963 4432985 4432991 4434093 4434141 ...
 $ sex               : Factor w/ 1 level "male": 1 1 1 1 1 NA 1 NA 1 1 ...
 $ concentration_pg_g: num [1:109] 11563 10629 10924 10414 13531 ...
 $ elev_m            : num [1:109] 3343 3353 3358 3578 3584 ...

The dataset contains the following variables:

  • date: Observation date of the fecal sample
  • site: Location where the sample was collected within Niwot Ridge
  • station: Specific sampling station
  • utm_easting: GPS East-West coordinate (UTM NAD83, Zone 13)
  • utm_northing: GPS North-South coordinate (UTM NAD83, Zone 13)
  • sex: Sex of the observed pika
  • concentration_pg_g: Glucocorticoid metabolite concentration
  • elev_m: Elevation in meters

Coordinate Systems and Spatial Data Processing

One of the most critical aspects of geographical mapping is handling coordinate systems correctly. Our data comes with coordinates in the Universal Transverse Mercator (UTM) system, but we’ll need to work with different coordinate systems for various purposes.

Converting to a Spatial Object

First, we’ll convert our data frame into a spatial object using the sf (Simple Features) package:

# Convert to sf object
df_pikas <- st_as_sf(x = nwt_pikas, 
                     coords = c("utm_easting", "utm_northing"))
df_pikas
Simple feature collection with 109 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 448884 ymin: 4432963 xmax: 451617 ymax: 4435616
CRS:           NA
# A tibble: 109 × 7
   date       site       station       sex   concentration_pg_g elev_m         geometry
   <date>     <fct>      <fct>         <fct>              <dbl>  <dbl>          <POINT>
 1 2018-06-08 Cable Gate Cable Gate 1  male              11563.  3343. (451373 4432963)
 2 2018-06-08 Cable Gate Cable Gate 2  male              10629.  3353. (451411 4432985)
 3 2018-06-08 Cable Gate Cable Gate 3  male              10924.  3358. (451462 4432991)
 4 2018-06-13 West Knoll West Knoll 3  male              10414.  3578. (449317 4434093)
 5 2018-06-13 West Knoll West Knoll 4  male              13531.  3584. (449342 4434141)
 6 2018-06-13 West Knoll West Knoll 5  NA                 7799.  3595. (449323 4434273)
 7 2018-06-13 West Knoll West Knoll 6  male               4715.  3603. (449243 4434156)
 8 2018-06-13 West Knoll West Knoll 7  NA                 2087.  3616. (449223 4434233)
 9 2018-06-13 West Knoll West Knoll 8  male              12899.  3606. (449244 4434292)
10 2018-06-13 West Knoll West Knoll 10 male              11329.  3567. (449438 4434201)

This conversion creates a geometry column containing points instead of separate coordinate columns. Note the CRS: NA line. At this stage, the points don’t have any information about their coordinate reference system (CRS).

Understanding Coordinate Reference Systems

A Coordinate Reference System (CRS) is fundamental in spatial data handling. It defines:

  • How the Earth’s 3D surface is represented in 2D (spatial projection)
  • How the Earth’s shape is modeled (datum)
  • The units of measurement
  • The orientation of the coordinate axes

For more information please visit: https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/intro-to-coordinate-reference-systems/

Our data uses the UTM system, specifically Zone 13 with the NAD83 datum, which can be found by looking up the metadata of this dataset with ?nwt_pikas.

We can set this using the EPSG code 26913:

# Set the coordinate reference system to UTM 13N
EPSG_CODE_UTM_13N <- 26913
df_pikas <- st_set_crs(df_pikas, EPSG_CODE_UTM_13N)
df_pikas
Simple feature collection with 109 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 448884 ymin: 4432963 xmax: 451617 ymax: 4435616
Projected CRS: NAD83 / UTM zone 13N
# A tibble: 109 × 7
   date       site       station       sex   concentration_pg_g elev_m         geometry
 * <date>     <fct>      <fct>         <fct>              <dbl>  <dbl>      <POINT [m]>
 1 2018-06-08 Cable Gate Cable Gate 1  male              11563.  3343. (451373 4432963)
 2 2018-06-08 Cable Gate Cable Gate 2  male              10629.  3353. (451411 4432985)
 3 2018-06-08 Cable Gate Cable Gate 3  male              10924.  3358. (451462 4432991)
 4 2018-06-13 West Knoll West Knoll 3  male              10414.  3578. (449317 4434093)
 5 2018-06-13 West Knoll West Knoll 4  male              13531.  3584. (449342 4434141)
 6 2018-06-13 West Knoll West Knoll 5  NA                 7799.  3595. (449323 4434273)
 7 2018-06-13 West Knoll West Knoll 6  male               4715.  3603. (449243 4434156)
 8 2018-06-13 West Knoll West Knoll 7  NA                 2087.  3616. (449223 4434233)
 9 2018-06-13 West Knoll West Knoll 8  male              12899.  3606. (449244 4434292)
10 2018-06-13 West Knoll West Knoll 10 male              11329.  3567. (449438 4434201)

Notice the updated Projected CRS: NAD83 / UTM zone 13N which confirms the assignment of a CRS has been successful.

Transforming Between Coordinate Systems

For web mapping and integration with various mapping services, we need to transform our coordinates to the WGS84 system (EPSG:4326), which uses latitude and longitude. This is the standard system used by GPS and most online mapping services:

# Transform to WGS84 (standard GPS latitude/longitude)
EPSG_CODE_UNIVERSAL_GPS <- 4326
df_pikas <- st_transform(df_pikas, EPSG_CODE_UNIVERSAL_GPS)
df_pikas
Simple feature collection with 109 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -105.5993 ymin: 40.0455 xmax: -105.5672 ymax: 40.06931
Geodetic CRS:  WGS 84
# A tibble: 109 × 7
   date       site       station       sex   concentration_pg_g elev_m             geometry
 * <date>     <fct>      <fct>         <fct>              <dbl>  <dbl>          <POINT [°]>
 1 2018-06-08 Cable Gate Cable Gate 1  male              11563.  3343.    (-105.57 40.0455)
 2 2018-06-08 Cable Gate Cable Gate 2  male              10629.  3353.  (-105.5696 40.0457)
 3 2018-06-08 Cable Gate Cable Gate 3  male              10924.  3358.  (-105.569 40.04576)
 4 2018-06-13 West Knoll West Knoll 3  male              10414.  3578. (-105.5942 40.05556)
 5 2018-06-13 West Knoll West Knoll 4  male              13531.  3584. (-105.5939 40.05599)
 6 2018-06-13 West Knoll West Knoll 5  NA                 7799.  3595. (-105.5942 40.05718)
 7 2018-06-13 West Knoll West Knoll 6  male               4715.  3603. (-105.5951 40.05612)
 8 2018-06-13 West Knoll West Knoll 7  NA                 2087.  3616. (-105.5954 40.05682)
 9 2018-06-13 West Knoll West Knoll 8  male              12899.  3606. (-105.5951 40.05735)
10 2018-06-13 West Knoll West Knoll 10 male              11329.  3567. (-105.5928 40.05654)

And once again we can confirm an appropriate transformation due to the presence of this line: Geodetic CRS: WGS 84.

One last thing before plotting, let’s extract out the longitude and latitude to their own columns:

df_pikas$lon <- st_coordinates(df_pikas)[, 1]
df_pikas$lat <- st_coordinates(df_pikas)[, 2]

Plotting without a map underlay

A modified scatterplot

ggplot(df_pikas, aes(x = lon, y = lat, fill = site)) +
  # shape 21 and color black are an excellent choice when we want fill to define the color
  # of the points rather than "color". In this case, the outline of the points is black, for
  # better visibility down the line when we add the map background.
  geom_point(
    size = 2,
    shape = 21,
    color = "black"
  ) +
  labs(
    title = "Location of Sampling Stations (no background map)",
    x = "Longitude (Degrees)",
    y = "Latitude (Degrees)",
    fill = "Site"
  ) +
  # Changes latitude and longitude to degrees
  coord_sf(crs = EPSG_CODE_UNIVERSAL_GPS) + 
  ggthemes::theme_base()

Plotting with a map underlay

Bounding Boxes

Map APIs

ggmap::ggmap(map) +
  geom_point(
    data = df_pikas,
    aes(x = lon, y = lat, fill = site),
    size = 2,
    shape = 21,
    color = "black"
  ) +
  coord_sf(crs = EPSG_CODE_UNIVERSAL_GPS) +
  labs(
    title = "Location of Sampling Stations (with map background)",
    x = "Longitude (Degrees)",
    y = "Latitude (Degrees)",
    color = "Site",
    fill = "Site"
  ) +
  ggthemes::theme_base()

Adding contours

ggmap::ggmap(map) +
  ggdensity::geom_hdr_lines(
    data = df_pikas,
    # Here we define the percentiles to be used for the HDR lines
    probs = c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.99),
    # after_stat() is used to access the calculated values from the geom_hdr_lines() function
    # probs is an internal reference to the probs argument we defined above
    aes(x = lon, y = lat, color = after_stat(probs)),
    alpha = 0.7
  ) +
  geom_point(
    data = df_pikas,
    aes(x = lon, y = lat, fill = site),
    size = 2,
    shape = 21,
    color = "black"
  ) +
  coord_sf(crs = EPSG_CODE_UNIVERSAL_GPS) +
  labs(
    title = "Location of Sampling Stations (with HDR lines)",
    x = "Longitude (Degrees)",
    y = "Latitude (Degrees)",
    color = "Percentile Geospatial Density",
    fill = "Site"
  ) +
  ggthemes::theme_base() +
  # Make the legend taller
  theme(legend.key.height = unit(1, "cm"))

Concluding Points