Working With Dates
Along with factors, dates are one of the other data types that can be a nuisance to work with, though they are also often essential for understanding our data. EEB-type samples are often taken at inconsistent sampling intervals, and we don’t always keep this information in concise formats.
We’ll focus here on a few key activities to do with dates:
- Taking a date data type and extracting sub-components from it (i.e. year, month, day, week)
- Turning a non-date data type into a date
Extracting Date Components
Let’s use an example of a timeseries with data collected through time on precipitation and temperature in Alaska:
library(tidyverse)
library(lterdatasampler)
df <- lterdatasampler::arc_weather
We can see what we’re dealiang with here:
head(df)
## # A tibble: 6 × 5
## date station mean_airtemp daily_precip mean_windspeed
## <date> <chr> <dbl> <dbl> <dbl>
## 1 1988-06-01 Toolik Fiel… 8.4 0 NA
## 2 1988-06-02 Toolik Fiel… 6 0 NA
## 3 1988-06-03 Toolik Fiel… 5.8 0 NA
## 4 1988-06-04 Toolik Fiel… 1.8 0 NA
## 5 1988-06-05 Toolik Fiel… 6.8 2.5 NA
## 6 1988-06-06 Toolik Fiel… 5.2 0 NA
So we already have a date column of the special data type date
. This in fact makes our life easy, as it’s always easier to extract information from a pre-formatted date column.
The best tool in our toolbox for this type of task is the lubridate
package. This package has a ton of great functions that let us work with dates more easily. Let’s test it out. Say we want to make a vector that has just all the years extracted from our date column. We could do that very easily like this with the lubridate::year()
function:
library(lubridate)
year <- lubridate::year(df$date)
# print the first ten entries
year[1:10]
## [1] 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988
And we see that this worked! So the way we probably use this the most often is to make new columns in a dataframe. Let’s go ahead and make a new column each for year, month, week, and day in our df
dataframe:
df %>%
# since we want lubridate to work with each row individually, use rowwise()
dplyr::rowwise() %>%
# we use mutate() to make a new column
dplyr::mutate(
year = lubridate::year(date),
month = lubridate::month(date),
week = lubridate::week(date),
day = lubridate::day(date)
)
## # A tibble: 11,171 × 9
## # Rowwise:
## date station mean_airtemp daily_precip mean_windspeed
## <date> <chr> <dbl> <dbl> <dbl>
## 1 1988-06-01 Toolik Fie… 8.4 0 NA
## 2 1988-06-02 Toolik Fie… 6 0 NA
## 3 1988-06-03 Toolik Fie… 5.8 0 NA
## 4 1988-06-04 Toolik Fie… 1.8 0 NA
## 5 1988-06-05 Toolik Fie… 6.8 2.5 NA
## 6 1988-06-06 Toolik Fie… 5.2 0 NA
## 7 1988-06-07 Toolik Fie… 2.2 7.6 NA
## 8 1988-06-08 Toolik Fie… 9.4 0 NA
## 9 1988-06-09 Toolik Fie… 13.1 0 NA
## 10 1988-06-10 Toolik Fie… 17.7 0 3.9
## # … with 11,161 more rows, and 4 more variables: year <dbl>,
## # month <dbl>, week <dbl>, day <int>
And here we can see the new columns have been made for us!
To start thinking about forming dates, we’ll use some fake data to make our lives easier. We can imagine the opposite scenario to above, we have some entries for, let’s say, year and month, but no full date.
This brings about a somewhat more challenging problem as there’s a decision point that needs to be executed — what day should we default to? This is a question that deserves careful consideration for each problem that arises and there is no one-size-fits-all solution. However, assuming you have decided that there is a simple assumption you can make (e.g. you will assume the data were collected on the first of the month), we can use this to make a new date
column from our existing data.
Let’s generate some fake data to work with:
df = data.frame(
year = sample(c(2010:2020), replace = TRUE, 200),
month = sample(c(1:12), replace = TRUE, 200),
# we'll make a set of fake sampled data here
observation = sample(c(12.5:16.6), replace = TRUE, 200)
)
The first thing to do is input our decided day values of the first day of each month. That’s easy enough:
df = df %>%
dplyr::mutate(
day = 1
)
head(df)
## year month observation day
## 1 2011 10 16.5 1
## 2 2012 1 16.5 1
## 3 2011 8 13.5 1
## 4 2019 6 15.5 1
## 5 2020 3 13.5 1
## 6 2019 7 14.5 1
Great, we have the info we need. Now, we can go ahead and make a date
column by combining our three other variables together using thye lubridate::make_date()
function:
df = df %>%
dplyr::rowwise() %>%
dplyr::mutate(
date = lubridate::make_date(year, month, day)
)
head(df)
## # A tibble: 6 × 5
## # Rowwise:
## year month observation day date
## <int> <int> <dbl> <dbl> <date>
## 1 2011 10 16.5 1 2011-10-01
## 2 2012 1 16.5 1 2012-01-01
## 3 2011 8 13.5 1 2011-08-01
## 4 2019 6 15.5 1 2019-06-01
## 5 2020 3 13.5 1 2020-03-01
## 6 2019 7 14.5 1 2019-07-01
Great, we can visually check this by looking across the first few rows and we see our function worked as it should.
Now you know how to move from dates to components and vice versa!