Skip to content

Special Type: Dates

Dates and times are fundamental data types in many analyses, especially in ecology and evolutionary biology where temporal patterns are often of interest. R provides robust tools for handling date and datetime objects, largely through base R functions and enhanced significantly by the lubridate package, which is part of the Tidyverse. This article will guide you through common operations for creating, manipulating, and formatting dates in R.

Loading Necessary Packages

We’ll start by loading the tidyverse, which automatically loads lubridate for advanced date-time manipulation, and lterdatasampler for example datasets.

library(tidyverse) # Loads lubridate automatically
library(lterdatasampler)

Creating Dates

Dates can be created in R in several ways. You can construct them from individual components (year, month, day) or convert them from character strings. Here, we create a tibble with year, month, and day columns, and then generate date objects using both base R’s as.Date() and lubridate‘s make_date().

# Date creation
df_date_creation <- tibble(
  year = sample(c(2010:2020), replace = TRUE, 200),
  month = sample(c(1:12), replace = TRUE, 200),
  day = sample(c(1:31), replace = TRUE, 200)
) %>%
  mutate(
    # Base R
    date_base = as.Date(
      paste(year, month, day, sep = "-"), # join the year, month, and day with a hyphen
      format = "%Y-%m-%d" # Indicate the format of the date
    ),

    # Tidyverse
    date_tidyverse = make_date(year, month, day)
  )
str(df_date_creation)
tibble [200 × 5] (S3: tbl_df/tbl/data.frame)
 $ year          : int [1:200] 2016 2010 2016 2018 2014 2011 2011 2020 2020 2015 ...
 $ month         : int [1:200] 11 3 1 9 9 10 11 5 10 6 ...
 $ day           : int [1:200] 9 4 22 20 7 30 6 23 24 3 ...
 $ date_base     : Date[1:200], format: "2016-11-09" "2010-03-04" "2016-01-22" "2018-09-20" ...
 $ date_tidyverse: Date[1:200], format: "2016-11-09" "2010-03-04" "2016-01-22" "2018-09-20" ...
# Note: The actual sampled year, month, day values will vary due to sample().
# Invalid dates (e.g., Feb 30) will result in NA.

The str() output shows that both date_base and date_tidyverse columns are of the Date class. make_date() is generally more robust and straightforward for creating dates from numerical components. Note that if your day, month, and year combinations are invalid (e.g. February 30th), make_date will produce an NA.

Working with an Example Dataset

For many of the following examples, we’ll use the arc_weather dataset from the lterdatasampler package. This dataset contains weather observations, including a date column.

arctic <- lterdatasampler::arc_weather
str(arctic)
tibble [11,171 × 5] (S3: tbl_df/tbl/data.frame)
 $ date          : Date[1:11171], format: "1988-06-01" "1988-06-02" "1988-06-03" "1988-06-04" ...
 $ station       : chr [1:11171] "Toolik Field Station" "Toolik Field Station" "Toolik Field Station" "Toolik Field Station" ...
 $ mean_airtemp  : num [1:11171] 8.4 6 5.8 1.8 6.8 5.2 2.2 9.4 13.1 17.7 ...
 $ daily_precip  : num [1:11171] 0 0 0 0 2.5 0 7.6 0 0 0 ...
 $ mean_windspeed: num [1:11171] NA NA NA NA NA NA NA NA NA 3.9 ...

The date column in the arctic tibble is already in Date format, which is convenient for our examples.

Extracting Components from Dates

Lubridate provides a set of intuitive functions to extract specific components from a date object, such as the year, month, day, day of the week, or day of the year. We’ll use the date column from our arctic dataset for these examples.

Year, Month, Day of Year

# Get the year
head(year(arctic$date))

# Get the month (numeric by default, or as a label)
head(month(arctic$date))
head(month(arctic$date, label = TRUE)) # Abbreviated month name
head(month(arctic$date, label = TRUE, abbr = FALSE)) # Full month name

# Get the day of the year
head(yday(arctic$date))
# Output for year()
[1] 1988 1988 1988 1988 1988 1988

# Output for month()
[1] 6 6 6 6 6 6

# Output for month(label = TRUE)
[1] Jun Jun Jun Jun Jun Jun
Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec

# Output for month(label = TRUE, abbr = FALSE)
[1] June June June June June June
Levels: January < February < March < April < May < June < July < August < September < October < November < December

# Output for yday()
[1] 153 154 155 156 157 158

Day of the Week

The wday() function extracts the day of the week. By default, it returns a number (1 for Sunday, 7 for Saturday). Using label = TRUE gives the name of the day.

# Get the day of the week
head(wday(arctic$date)) # Numeric (1 for Sunday, 7 for Saturday by default)
head(wday(arctic$date, label = TRUE)) # Abbreviated day name
head(wday(arctic$date, label = TRUE, abbr = FALSE)) # Full day name
# Output for wday()
[1] 4 5 6 7 1 2

# Output for wday(label = TRUE)
[1] Wed Thu Fri Sat Sun Mon
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

# Output for wday(label = TRUE, abbr = FALSE)
[1] Wednesday Thursday  Friday    Saturday  Sunday    Monday   
Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < Friday < Saturday

Date Arithmetic

Performing arithmetic with dates is a common task, such as adding or subtracting time periods or calculating the duration between two dates. Lubridate provides helper functions like days(), weeks(), months(), and years() to create period objects that can be added to or subtracted from dates. For calculating differences, R’s built-in difftime() or direct subtraction can be used.

Adding and Subtracting Time Periods

The today() function from lubridate conveniently gives the current system date.

# Create a sample date (current date)
today_date <- today() # Gets the current date
today_date

# Add/subtract days, months, years
today_date + days(10) # 10 days from today
today_date - weeks(2) # 2 weeks ago
today_date + months(3) # 3 months from today
today_date + years(1) # 1 year from today
# Output for today_date (will be the current date when run)
[1] "2023-10-27" # Example: if today is October 27, 2023

# Output for today_date + days(10)
[1] "2023-11-06" # Example: 10 days after Oct 27, 2023

# Output for today_date - weeks(2)
[1] "2023-10-13" # Example: 2 weeks before Oct 27, 2023

# Output for today_date + months(3)
[1] "2024-01-27" # Example: 3 months after Oct 27, 2023

# Output for today_date + years(1)
[1] "2024-10-27" # Example: 1 year after Oct 27, 2023

Calculating Differences Between Dates

When you subtract one Date object from another, R returns a difftime object, which represents the time difference in days by default.

date1 <- ymd("2023-01-15")
date2 <- ymd("2023-07-20")

# Difference in days (default for Date objects)
date_diff_days <- date2 - date1
date_diff_days
Time difference of 186 days

The difftime() function offers more control over the units of the difference (e.g., “weeks”, “auto”). Lubridate’s duration objects (created with functions like ddays() or by converting difftime objects with as.duration()) represent exact time spans.

# Using difftime for more control over units
difftime(date2, date1, units = "weeks")
difftime(date2, date1, units = "auto") # lubridate tries to pick a sensible unit

# Duration objects for precise time spans
duration_diff <- date2 - date1 # Creates a difftime object
as.duration(duration_diff)
# Output for difftime(units = "weeks")
Time difference of 26.57143 weeks

# Output for difftime(units = "auto")
Time difference of 186 days

# Output for as.duration()
[1] "16070400s (~26.57 weeks)"

Modifying Date Components

Lubridate’s update() function is a versatile tool for changing specific components of a date object, such as the year, month, or day, while leaving other components unchanged. This is often more straightforward than deconstructing and reconstructing dates manually.

# Use update() to change specific parts of a date object
original_date <- ymd("2023-08-15")
original_date

# Change the year
update(original_date, year = 2024)

# Change the month and day
update(original_date, month = 12, day = 25)
# Output for original_date
[1] "2023-08-15"

# Output for update(original_date, year = 2024)
[1] "2024-08-15"

# Output for update(original_date, month = 12, day = 25)
[1] "2023-12-25"

Rounding Dates

Lubridate offers functions like floor_date(), ceiling_date(), and round_date() to round date (or datetime) objects to a specified unit. This is useful for aggregating data to coarser time periods, like the start of the week, month, or year.

# Round dates to the nearest unit (e.g., week, month, year)
messy_date <- ymd("2023-04-17")
messy_date

# Round to the start of the month (floor)
floor_date(messy_date, "month")

# Round to the start of the year
floor_date(messy_date, "year")

# Round to the nearest week (default starts on Sunday)
round_date(messy_date, "week")

# Round to the end of the month (ceiling_date gives start of next unit)
# A common way to get the last day of the month:
ceiling_date(messy_date, "month") - days(1)
# Output for messy_date
[1] "2023-04-17"

# Output for floor_date(messy_date, "month")
[1] "2023-04-01"

# Output for floor_date(messy_date, "year")
[1] "2023-01-01"

# Output for round_date(messy_date, "week")
[1] "2023-04-16"

# Output for ceiling_date(messy_date, "month") - days(1)
[1] "2023-04-30"

Parsing Dates from Character Strings

Often, dates are imported as character strings. Lubridate simplifies the conversion of these strings into proper date objects with a family of functions that identify the order of year (y), month (m), and day (d) components in the string. Examples include ymd(), mdy(), dmy(). If time components are also present, functions like ymd_hms() (year-month-day hour-minute-second) can be used.

date_str1 <- "2023-10-25"
ymd(date_str1)

date_str2 <- "October 25, 2023"
mdy(date_str2)

date_str3 <- "25/10/2023"
dmy(date_str3)

# Lubridate can also parse datetime strings
date_str_time <- "2023-10-25 14:30:00"
ymd_hms(date_str_time)
# Output for ymd(date_str1)
[1] "2023-10-25"

# Output for mdy(date_str2)
[1] "2023-10-25"

# Output for dmy(date_str3)
[1] "2023-10-25"

# Output for ymd_hms(date_str_time)
# Result is a POSIXct (datetime) object
[1] "2023-10-25 14:30:00 UTC"

These functions are quite flexible and can often correctly parse dates even if the separators (like hyphens, slashes, or spaces) vary, as long as the order of components matches the function name.

Formatting Dates into Strings

Conversely, you might need to convert date objects into character strings in a specific format, for example, for reporting or plotting. Base R’s format() function is very powerful for this, using C-style format codes (e.g., %Y for 4-digit year, %m for 2-digit month, %d for 2-digit day). Lubridate also offers a stamp() function for creating formatting templates.

Using Base R format()

# Base R format() function is powerful
a_date <- ymd("2023-12-01")
format(a_date, "%Y/%m/%d")          # YYYY/MM/DD
format(a_date, "%B %d, %Y")        # MonthName Day, Year
format(a_date, "%A, %B %e, %Y")    # Weekday, MonthName Day (no leading zero), Year
format(a_date, "It's %A, the %j day of %Y.") # Custom string with date parts
# Output for format(a_date, "%Y/%m/%d")
[1] "2023/12/01"

# Output for format(a_date, "%B %d, %Y")
[1] "December 01, 2023"

# Output for format(a_date, "%A, %B %e, %Y")
[1] "Friday, December  1, 2023"

# Output for format(a_date, "It's %A, the %j day of %Y.")
[1] "It's Friday, the 335 day of 2023."

Using Lubridate stamp()

Lubridate’s stamp() function provides a convenient way to create date/time formatters from example strings. You provide a string that looks like the desired output, and stamp() creates a function that will format dates accordingly.

# Lubridate's stamp() for template-based formatting
# Define a template string and stamp() fills it in
stamp_template <- stamp("Signed on Monday, March 1, 1999")
stamp_template(a_date) # a_date is still "2023-12-01"

stamp_template_short <- stamp("Date: 03/01/99")
stamp_template_short(a_date)
# Output for stamp_template(a_date)
[1] "Signed on Friday, December 01, 2023"

# Output for stamp_template_short(a_date)
[1] "Date: 12/01/23"

Example: Working with the arctic Dataset

Let’s apply some of these date manipulation techniques to the arctic dataset. We can, for instance, count observations per year or calculate average monthly temperatures.

Count Observations Per Year

# First, ensure no NA dates for reliability in grouping
arctic_cleaned <- arctic %>%
  filter(!is.na(date))

arctic_cleaned %>%
  mutate(year = year(date)) %>%
  count(year)
# A tibble: 31 × 2
    year     n
   <dbl> <int>
 1  1988   214
 2  1989   365
 3  1990   365
 4  1991   365
 5  1992   366
 6  1993   365
 7  1994   365
 8  1995   365
 9  1996   366
10  1997   365
# ℹ 21 more rows

Average Temperature in July per Year

# What is the average temperature in July for each year?
arctic_cleaned %>%
  mutate(
    year = year(date),
    month = month(date, label = TRUE, abbr = FALSE) # Get full month name
  ) %>%
  filter(month == "July", !is.na(mean_airtemp)) %>%
  group_by(year) %>%
  summarise(avg_july_temp = mean(mean_airtemp)) %>%
  head() # Show first few years
# A tibble: 6 × 2
   year avg_july_temp
  <dbl>         <dbl>
1  1988         12.0 
2  1989         12.4 
3  1990         12.7 
4  1991          9.24
5  1992         12.7 
6  1993         13.8 

Conclusion

Working with dates and times is a critical skill in data analysis. R, especially when augmented with the lubridate package, provides a comprehensive and user-friendly toolkit for these tasks. From basic creation and component extraction to complex arithmetic, parsing, and formatting, these functions help ensure that temporal data is handled accurately and efficiently, paving the way for insightful analyses.