Special Type: Dates
Dates and times are fundamental data types in many analyses, especially in ecology and evolutionary biology where temporal patterns are often of interest. R provides robust tools for handling date and datetime objects, largely through base R functions and enhanced significantly by the lubridate
package, which is part of the Tidyverse. This article will guide you through common operations for creating, manipulating, and formatting dates in R.
Loading Necessary Packages
We’ll start by loading the tidyverse
, which automatically loads lubridate
for advanced date-time manipulation, and lterdatasampler
for example datasets.
library(tidyverse) # Loads lubridate automatically
library(lterdatasampler)
Creating Dates
Dates can be created in R in several ways. You can construct them from individual components (year, month, day) or convert them from character strings. Here, we create a tibble with year, month, and day columns, and then generate date objects using both base R’s as.Date()
and lubridate
‘s make_date()
.
# Date creation
df_date_creation <- tibble(
year = sample(c(2010:2020), replace = TRUE, 200),
month = sample(c(1:12), replace = TRUE, 200),
day = sample(c(1:31), replace = TRUE, 200)
) %>%
mutate(
# Base R
date_base = as.Date(
paste(year, month, day, sep = "-"), # join the year, month, and day with a hyphen
format = "%Y-%m-%d" # Indicate the format of the date
),
# Tidyverse
date_tidyverse = make_date(year, month, day)
)
str(df_date_creation)
tibble [200 × 5] (S3: tbl_df/tbl/data.frame)
$ year : int [1:200] 2016 2010 2016 2018 2014 2011 2011 2020 2020 2015 ...
$ month : int [1:200] 11 3 1 9 9 10 11 5 10 6 ...
$ day : int [1:200] 9 4 22 20 7 30 6 23 24 3 ...
$ date_base : Date[1:200], format: "2016-11-09" "2010-03-04" "2016-01-22" "2018-09-20" ...
$ date_tidyverse: Date[1:200], format: "2016-11-09" "2010-03-04" "2016-01-22" "2018-09-20" ...
# Note: The actual sampled year, month, day values will vary due to sample().
# Invalid dates (e.g., Feb 30) will result in NA.
The str()
output shows that both date_base
and date_tidyverse
columns are of the Date
class. make_date()
is generally more robust and straightforward for creating dates from numerical components. Note that if your day, month, and year combinations are invalid (e.g. February 30th), make_date
will produce an NA
.
Working with an Example Dataset
For many of the following examples, we’ll use the arc_weather
dataset from the lterdatasampler
package. This dataset contains weather observations, including a date column.
arctic <- lterdatasampler::arc_weather
str(arctic)
tibble [11,171 × 5] (S3: tbl_df/tbl/data.frame)
$ date : Date[1:11171], format: "1988-06-01" "1988-06-02" "1988-06-03" "1988-06-04" ...
$ station : chr [1:11171] "Toolik Field Station" "Toolik Field Station" "Toolik Field Station" "Toolik Field Station" ...
$ mean_airtemp : num [1:11171] 8.4 6 5.8 1.8 6.8 5.2 2.2 9.4 13.1 17.7 ...
$ daily_precip : num [1:11171] 0 0 0 0 2.5 0 7.6 0 0 0 ...
$ mean_windspeed: num [1:11171] NA NA NA NA NA NA NA NA NA 3.9 ...
The date
column in the arctic
tibble is already in Date format, which is convenient for our examples.
Extracting Components from Dates
Lubridate provides a set of intuitive functions to extract specific components from a date object, such as the year, month, day, day of the week, or day of the year. We’ll use the date
column from our arctic
dataset for these examples.
Year, Month, Day of Year
# Get the year
head(year(arctic$date))
# Get the month (numeric by default, or as a label)
head(month(arctic$date))
head(month(arctic$date, label = TRUE)) # Abbreviated month name
head(month(arctic$date, label = TRUE, abbr = FALSE)) # Full month name
# Get the day of the year
head(yday(arctic$date))
# Output for year()
[1] 1988 1988 1988 1988 1988 1988
# Output for month()
[1] 6 6 6 6 6 6
# Output for month(label = TRUE)
[1] Jun Jun Jun Jun Jun Jun
Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
# Output for month(label = TRUE, abbr = FALSE)
[1] June June June June June June
Levels: January < February < March < April < May < June < July < August < September < October < November < December
# Output for yday()
[1] 153 154 155 156 157 158
Day of the Week
The wday()
function extracts the day of the week. By default, it returns a number (1 for Sunday, 7 for Saturday). Using label = TRUE
gives the name of the day.
# Get the day of the week
head(wday(arctic$date)) # Numeric (1 for Sunday, 7 for Saturday by default)
head(wday(arctic$date, label = TRUE)) # Abbreviated day name
head(wday(arctic$date, label = TRUE, abbr = FALSE)) # Full day name
# Output for wday()
[1] 4 5 6 7 1 2
# Output for wday(label = TRUE)
[1] Wed Thu Fri Sat Sun Mon
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# Output for wday(label = TRUE, abbr = FALSE)
[1] Wednesday Thursday Friday Saturday Sunday Monday
Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < Friday < Saturday
Date Arithmetic
Performing arithmetic with dates is a common task, such as adding or subtracting time periods or calculating the duration between two dates. Lubridate provides helper functions like days()
, weeks()
, months()
, and years()
to create period objects that can be added to or subtracted from dates. For calculating differences, R’s built-in difftime()
or direct subtraction can be used.
Adding and Subtracting Time Periods
The today()
function from lubridate conveniently gives the current system date.
# Create a sample date (current date)
today_date <- today() # Gets the current date
today_date
# Add/subtract days, months, years
today_date + days(10) # 10 days from today
today_date - weeks(2) # 2 weeks ago
today_date + months(3) # 3 months from today
today_date + years(1) # 1 year from today
# Output for today_date (will be the current date when run)
[1] "2023-10-27" # Example: if today is October 27, 2023
# Output for today_date + days(10)
[1] "2023-11-06" # Example: 10 days after Oct 27, 2023
# Output for today_date - weeks(2)
[1] "2023-10-13" # Example: 2 weeks before Oct 27, 2023
# Output for today_date + months(3)
[1] "2024-01-27" # Example: 3 months after Oct 27, 2023
# Output for today_date + years(1)
[1] "2024-10-27" # Example: 1 year after Oct 27, 2023
Calculating Differences Between Dates
When you subtract one Date object from another, R returns a difftime
object, which represents the time difference in days by default.
date1 <- ymd("2023-01-15")
date2 <- ymd("2023-07-20")
# Difference in days (default for Date objects)
date_diff_days <- date2 - date1
date_diff_days
Time difference of 186 days
The difftime()
function offers more control over the units of the difference (e.g., “weeks”, “auto”). Lubridate’s duration objects (created with functions like ddays()
or by converting difftime
objects with as.duration()
) represent exact time spans.
# Using difftime for more control over units
difftime(date2, date1, units = "weeks")
difftime(date2, date1, units = "auto") # lubridate tries to pick a sensible unit
# Duration objects for precise time spans
duration_diff <- date2 - date1 # Creates a difftime object
as.duration(duration_diff)
# Output for difftime(units = "weeks")
Time difference of 26.57143 weeks
# Output for difftime(units = "auto")
Time difference of 186 days
# Output for as.duration()
[1] "16070400s (~26.57 weeks)"
Modifying Date Components
Lubridate’s update()
function is a versatile tool for changing specific components of a date object, such as the year, month, or day, while leaving other components unchanged. This is often more straightforward than deconstructing and reconstructing dates manually.
# Use update() to change specific parts of a date object
original_date <- ymd("2023-08-15")
original_date
# Change the year
update(original_date, year = 2024)
# Change the month and day
update(original_date, month = 12, day = 25)
# Output for original_date
[1] "2023-08-15"
# Output for update(original_date, year = 2024)
[1] "2024-08-15"
# Output for update(original_date, month = 12, day = 25)
[1] "2023-12-25"
Rounding Dates
Lubridate offers functions like floor_date()
, ceiling_date()
, and round_date()
to round date (or datetime) objects to a specified unit. This is useful for aggregating data to coarser time periods, like the start of the week, month, or year.
# Round dates to the nearest unit (e.g., week, month, year)
messy_date <- ymd("2023-04-17")
messy_date
# Round to the start of the month (floor)
floor_date(messy_date, "month")
# Round to the start of the year
floor_date(messy_date, "year")
# Round to the nearest week (default starts on Sunday)
round_date(messy_date, "week")
# Round to the end of the month (ceiling_date gives start of next unit)
# A common way to get the last day of the month:
ceiling_date(messy_date, "month") - days(1)
# Output for messy_date
[1] "2023-04-17"
# Output for floor_date(messy_date, "month")
[1] "2023-04-01"
# Output for floor_date(messy_date, "year")
[1] "2023-01-01"
# Output for round_date(messy_date, "week")
[1] "2023-04-16"
# Output for ceiling_date(messy_date, "month") - days(1)
[1] "2023-04-30"
Parsing Dates from Character Strings
Often, dates are imported as character strings. Lubridate simplifies the conversion of these strings into proper date objects with a family of functions that identify the order of year (y), month (m), and day (d) components in the string. Examples include ymd()
, mdy()
, dmy()
. If time components are also present, functions like ymd_hms()
(year-month-day hour-minute-second) can be used.
date_str1 <- "2023-10-25"
ymd(date_str1)
date_str2 <- "October 25, 2023"
mdy(date_str2)
date_str3 <- "25/10/2023"
dmy(date_str3)
# Lubridate can also parse datetime strings
date_str_time <- "2023-10-25 14:30:00"
ymd_hms(date_str_time)
# Output for ymd(date_str1)
[1] "2023-10-25"
# Output for mdy(date_str2)
[1] "2023-10-25"
# Output for dmy(date_str3)
[1] "2023-10-25"
# Output for ymd_hms(date_str_time)
# Result is a POSIXct (datetime) object
[1] "2023-10-25 14:30:00 UTC"
These functions are quite flexible and can often correctly parse dates even if the separators (like hyphens, slashes, or spaces) vary, as long as the order of components matches the function name.
Formatting Dates into Strings
Conversely, you might need to convert date objects into character strings in a specific format, for example, for reporting or plotting. Base R’s format()
function is very powerful for this, using C-style format codes (e.g., %Y
for 4-digit year, %m
for 2-digit month, %d
for 2-digit day). Lubridate also offers a stamp()
function for creating formatting templates.
Using Base R format()
# Base R format() function is powerful
a_date <- ymd("2023-12-01")
format(a_date, "%Y/%m/%d") # YYYY/MM/DD
format(a_date, "%B %d, %Y") # MonthName Day, Year
format(a_date, "%A, %B %e, %Y") # Weekday, MonthName Day (no leading zero), Year
format(a_date, "It's %A, the %j day of %Y.") # Custom string with date parts
# Output for format(a_date, "%Y/%m/%d")
[1] "2023/12/01"
# Output for format(a_date, "%B %d, %Y")
[1] "December 01, 2023"
# Output for format(a_date, "%A, %B %e, %Y")
[1] "Friday, December 1, 2023"
# Output for format(a_date, "It's %A, the %j day of %Y.")
[1] "It's Friday, the 335 day of 2023."
Using Lubridate stamp()
Lubridate’s stamp()
function provides a convenient way to create date/time formatters from example strings. You provide a string that looks like the desired output, and stamp()
creates a function that will format dates accordingly.
# Lubridate's stamp() for template-based formatting
# Define a template string and stamp() fills it in
stamp_template <- stamp("Signed on Monday, March 1, 1999")
stamp_template(a_date) # a_date is still "2023-12-01"
stamp_template_short <- stamp("Date: 03/01/99")
stamp_template_short(a_date)
# Output for stamp_template(a_date)
[1] "Signed on Friday, December 01, 2023"
# Output for stamp_template_short(a_date)
[1] "Date: 12/01/23"
Example: Working with the arctic
Dataset
Let’s apply some of these date manipulation techniques to the arctic
dataset. We can, for instance, count observations per year or calculate average monthly temperatures.
Count Observations Per Year
# First, ensure no NA dates for reliability in grouping
arctic_cleaned <- arctic %>%
filter(!is.na(date))
arctic_cleaned %>%
mutate(year = year(date)) %>%
count(year)
# A tibble: 31 × 2
year n
<dbl> <int>
1 1988 214
2 1989 365
3 1990 365
4 1991 365
5 1992 366
6 1993 365
7 1994 365
8 1995 365
9 1996 366
10 1997 365
# ℹ 21 more rows
Average Temperature in July per Year
# What is the average temperature in July for each year?
arctic_cleaned %>%
mutate(
year = year(date),
month = month(date, label = TRUE, abbr = FALSE) # Get full month name
) %>%
filter(month == "July", !is.na(mean_airtemp)) %>%
group_by(year) %>%
summarise(avg_july_temp = mean(mean_airtemp)) %>%
head() # Show first few years
# A tibble: 6 × 2
year avg_july_temp
<dbl> <dbl>
1 1988 12.0
2 1989 12.4
3 1990 12.7
4 1991 9.24
5 1992 12.7
6 1993 13.8
Conclusion
Working with dates and times is a critical skill in data analysis. R, especially when augmented with the lubridate
package, provides a comprehensive and user-friendly toolkit for these tasks. From basic creation and component extraction to complex arithmetic, parsing, and formatting, these functions help ensure that temporal data is handled accurately and efficiently, paving the way for insightful analyses.