A Little About Dates in R

Before we launch into any analysis that contains dates, we should know a few important nuggets about how R handles date-like objects.

There are 3 date/time classes are built in to R
- Date
- POSIXct
- POSIXlt

Base R

First, base R can read a string of text and convert it to a date class. To help it read the date, you must tell R what date format your character string should expect. Below are several examples. You can look at all the possible format and codes by running ?strptime in your R console.

strptime("October 16, 1984", format = "%B %e, %Y")
## [1] "1984-10-16 EDT"
strptime("16 October, 1984", format = "%e %B, %Y")
## [1] "1984-10-16 EDT"
strptime("16-- October, 1984", format = "%e-- %B, %Y")
## [1] "1984-10-16 EDT"
class(strptime("16-- October, 1984", format = "%e-- %B, %Y"))
## [1] "POSIXlt" "POSIXt"
birthday = strptime("16-- October, 1984", format = "%e-- %B, %Y")

As you can see, the strptime command recognizes your string as a POSIXlt POSIXt class.

lubridate

A second and easier way to have R recognize dates is to use the lubridate package in R. Thanks again Hadley

library(lubridate)

Using lubridate also allows R to read character strings as dates. However, instead of having to tell R the exact format of your string (which can be difficult), lubridate tries many methods to recognize your string. You simply provide it the order of your month, day, and year in ymd format or any combination thereof.

mdy("June 14, 2018")
## [1] "2018-06-14"
dmy("14 June, 2018")
## [1] "2018-06-14"
dmy("14-- June, 2018")
## [1] "2018-06-14"
class(dmy("14-- June, 2018"))
## [1] "Date"

You’ll notice that lubridate creates a date class. To change it into POSIXlt POSIXt format, wrap your text with the following code.

class(as.POSIXlt(mdy("June 14, 2018")))
## [1] "POSIXlt" "POSIXt"

We also need to ensure our date is the correct timezone. This would be more important if our date had a time included.

date = as.POSIXlt(dmy("14 June, 2018")) 
date
## [1] "2018-06-14 UTC"
date = force_tz(date, tzone = "America/New_York")
date
## [1] "2018-06-14 EDT"

When a date vector is of class as.POSIXlt, all the information is stored as a list. You can also extract specific information from the list as well.

date
## [1] "2018-06-14 EDT"
unlist(date)
##      sec      min     hour     mday      mon     year     wday     yday 
##      "0"      "0"      "0"     "14"      "5"    "118"      "4"    "164" 
##    isdst     zone   gmtoff 
##      "1"    "EDT" "-14400"
date$mon
## [1] 5
month(date)
## [1] 6
date$year
## [1] 118
year(date)
## [1] 2018

You can manipulate these date vectors as well.

date - birthday
## Time difference of 12294 days
birthday + hours(4)
## [1] "1984-10-16 04:00:00 EDT"
birthday + days(4)
## [1] "1984-10-20 EDT"
date + years(4) + months(9)
## [1] "2023-03-14 EDT"