2 Intermediate R
https://learn.datacamp.com/courses/intermediate-r
Main functions and concepts covered in this BP chapter:
- Relational (> >= == != <= <) and Logical (& |) Operators
if()
else()
else if()
while()
andfor()
loops, withbreak
andnext
- Functions, and writing your own using
function()
- Installing and loading packages
install.packages()
library()
require()
lapply()
,sapply()
,vapply()
unlist()
- Some useful functions (see 2.5)
- Regular Expressions (RegEx)
grepl()
andgrep()
for searchingsub()
gsub()
for replacing- Note: regular expressions are crucial in many aspects of programming, so learning how to use them at some point would be beneficial
- Dates and times
Packages used in this chapter:
## Load all packages used in this chapter
# There are no packages used in this chapter
# The exception is that ggplot2 is loaded below, but it's loaded as an example of how to load a package
# So, for this chapter only we'll load it down there
Datasets used in this chapter:
2.1 Conditionals and Control Flow
See section 1.2.5 for additional discussion.
Relational and Logical Operators:
<
less than>
greater than<=
less than or equal to>=
greater than or equal to==
equal!=
not equal&
and|
or
In general, you can think of TRUE == 1
and FALSE == 0
. That means:
## [1] TRUE
R determines greater than based on alphabetical order "a" < "b"
. For example:
## [1] TRUE
## [1] TRUE
Logical comparisons also work with vectors, both vector compared to constant and vector compared to another vector:
# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)
# vector compared to constant
linkedin > 15
## [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [1] FALSE TRUE TRUE FALSE FALSE TRUE FALSE
2.1.0.1 And, Or
And (&
) is TRUE if both parts are TRUE:
## [1] FALSE
## [1] TRUE
## [1] FALSE
or (|
) is TRUE if one part or the other, or both, are TRUE. Using the same example as above for and, all three are true because at least one side is true. It’s only FALSE if we add a fourth example were neither comparison is TRUE.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE
2.1.0.2 Not (!)
not (!
) is an exclamation point. It can be used to check if two things are NOT equal. It can also be used to swap TRUE and FALSE (if something is TRUE, putting !
before it makes it false and if something is FALSE, putting !
before it makes it TRUE)
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
Can also use !
before function that returns logical
## [1] TRUE
## [1] FALSE
You can put multiple conditions in parentheses and NOT the entire thing:
## [1] TRUE
## [1] FALSE
2.1.0.3 Double and and or
Double and/or (&&
||
) only compares the first entry in a vector vs returning TRUE/FALSE for each element in a vector
## [1] FALSE FALSE TRUE FALSE
## [1] TRUE TRUE TRUE FALSE
ERROR: in x && y: ‘length = 4’ in coercion to ‘logical(1)’
ERROR: in x || y: ‘length = 4’ in coercion to ‘logical(1)’
2.1.1 Conditional (If Else) Statements
if()
Statement: Runs specified code if the condition is met
else()
and else if()
Statements: Used with an if statement for what to do if initial if()
isn’t true
x <- 2
if(x<0) {
print("x is a negative number")
} else if(x == 0) {
print("x is equal to 0")
} else {
print("x is a positive number")
}
## [1] "x is a positive number"
2.1.1.1 ifelse
I meant to Google “R if else” but left out the space and found the ifelse() function. It lets you do a simple version of if()
and else()
in one line, just like the Excel If() function from QDM.
## [1] "It is"
## [1] "It is not"
2.2 Loops
2.2.1 While loop
While loops continue to execute the code over and over as long as the condition remains true.
Basic while loop
## [1] "ctr is set to 1"
## [1] "ctr is set to 2"
## [1] "ctr is set to 3"
## [1] "ctr is set to 4"
## [1] "ctr is set to 5"
## [1] "ctr is set to 6"
## [1] "ctr is set to 7"
## [1] "reached the end with value ctr = 8"
2.2.1.1 break in a while loop
The while loop keeps going until the condition is met. You can also add a break()
statement to stop the while loop.
# break out of loop if i is divisible by 4
i <- 1
while (i <= 10) {
print(paste("i: ",i))
if ((i)%%4==0) {
break
}
i <- i + 1
}
## [1] "i: 1"
## [1] "i: 2"
## [1] "i: 3"
## [1] "i: 4"
## [1] "reached the end with value i = 4"
2.2.2 For Loop
for([variable] in [sequence]) {}
sequence can be numbers. In this case, it works similar to a while loop except you don’t need to manually go to the next value of the counter
## [1] 1
## [1] 2
## [1] 3
## [1] 4
Sequence can also be matrices, lists, etc, and let R figure it out for you. Here’s an example with a list:
# The nyc list is already specified
nyc <- list(pop = 8405837,
boroughs = c("Manhattan", "Bronx", "Brooklyn", "Queens", "Staten Island"),
capital = FALSE)
# Loop version 1
for (n in nyc) {
print(n)
}
## [1] 8405837
## [1] "Manhattan" "Bronx" "Brooklyn" "Queens"
## [5] "Staten Island"
## [1] FALSE
## [1] 8405837
## [1] "Manhattan" "Bronx" "Brooklyn" "Queens"
## [5] "Staten Island"
## [1] FALSE
You can print out a matrix leaving it to go through the elements, or nest the for loops
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [1] 1
## [1] 4
## [1] 7
## [1] 2
## [1] 5
## [1] 8
## [1] 3
## [1] 6
## [1] 9
## [1] "m[1,1] = 1"
## [1] "m[1,2] = 2"
## [1] "m[1,3] = 3"
## [1] "m[2,1] = 4"
## [1] "m[2,2] = 5"
## [1] "m[2,3] = 6"
## [1] "m[3,1] = 7"
## [1] "m[3,2] = 8"
## [1] "m[3,3] = 9"
2.2.2.1 next and break in a for loop
The break
statement abandons the active loop: the remaining code in the loop is skipped and the loop is not iterated over anymore.
The next
statement skips the remainder of the code in the loop, but continues the iteration.
# The linkedin vector has already been defined for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
# Extend the for loop
for (li in linkedin) {
if (li > 10) {
print("You're popular!")
} else {
print("Be more visible!")
}
# Add if statement with break
if(li>16){
print("This is ridiculous, I'm outta here!")
break
}
# Add if statement with next
if(li<5){
print( "This is too embarrassing!")
next
}
print(li)
}
## [1] "You're popular!"
## [1] 16
## [1] "Be more visible!"
## [1] 9
## [1] "You're popular!"
## [1] 13
## [1] "Be more visible!"
## [1] 5
## [1] "Be more visible!"
## [1] "This is too embarrassing!"
## [1] "You're popular!"
## [1] "This is ridiculous, I'm outta here!"
2.3 Functions
Functions take an input or inputs and the do something with it, sometimes making changes, sometimes printing something, sometimes returning something.
Type a ?
before a function to find out what it does (it opens in the help window to the right).
args(function_name)
returns a short list of the arguments so you don’t need to look trough the longer help description. From trying it, it doesn’t seem to work very well though (e.g., look at ?mean
versus args(mean)
)
## function (x, na.rm = FALSE)
## NULL
Reading about the function can help you get thing to work correctly when they don’t initially. For example, suppose there’s an NA value and you try to take the mean. If you read about the mean()
function, you learn how to deal with the NA value.
## [1] NA
## [1] 12.33333
2.3.1 Required vs optional arguments
Here’s what is listed for mean: mean(x, trim = 0, na.rm = FALSE, ...)
x
is required. It doesn’t have a default, so if you don’t supply x
, it won’t work
trim
and na.rm
both have default values. This makes them optional arguments. You can specify the default value explicitly if you want, but if you don’t (i.e., if you just leave them out entirely), the function will use the default value. If you don’t want to use the default value, then you need to specify another value.
2.3.2 Writing your own function
Use the function()
function to write your own function.
Assign it into a variable with the name you want to name the function (i.e., what you’ll call later to use your function).
Arguments for your function go in parentheses of the function()
function.
The body of the function goes in curly brackets.
It will return whatever comes last in the function, but if you want to return something earlier, use the return()
function.
divideAndAdd1 <- function(a,b) {
if(b == 0) {
return("Can't divide by 0")
}
a/b + 1
}
divideAndAdd1(10,0)
## [1] "Can't divide by 0"
## [1] 6
2.3.3 Function Scoping
Function Scoping implies that variables that are defined inside a function are not accessible outside that function. In this example, the variable used as the argument for the function and the one used inside the function aren’t accessible outside the function. Trying to use them gives an error.
pow_two <- function(x_is_an_arg) {
y_is_inside_function <- x_is_an_arg ^ 2
return(y_is_inside_function)
}
pow_two(4)
## [1] 16
ERROR: in eval(expr, envir, enclos): object ‘y_is_inside_function’ not found
ERROR: in eval(expr, envir, enclos): object ‘x_is_an_arg’ not found
2.3.3.1 By value, not by reference
R functions pass arguments by value, not by reference. This means that when you call a function and use a variable as an argument when you call the function, if the function changes that argument’s value, once the function is done and returns, the value of that variable is unchanged. Here’s an example:
## [1] 7
## [1] 8
## [1] 7
This is true even if the name of the variable is the same
## [1] 7
## [1] 8
## [1] 7
2.3.3.2 Nested functions
It’s fine to write a function, and then write a second function that calls that first function (just like it’s fine to call built in functions inside custom functions).
Here’s an example:
interpret <- function(num_views) {
if (num_views > 15) {
print("You're popular!")
return(num_views)
} else {
print("Try to be more visible!")
return(0)
}
}
# views: vector with data to interpret
# return_sum: return total number of views on popular days
interpret_all <- function(views, return_sum = TRUE) {
count <- 0
for (v in views) {
count <- count + interpret(v)
}
if (return_sum == TRUE) {
count
} else {
NULL
}
}
# Call the interpret_all() function on both linkedin and facebook
facebook
## [1] 17 7 5 16 8 13 14
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] 33
2.3.4 Install/Load Packages
Before you can load a package it must be installed. You only install once but you must load the package in each session you want to use it.
install.packages ("ggplot2")
installs the ggplot2
package
Do NOT leave an install.packages()
line in a file so that every time you run the file it installs the package. This makes it run very slow.
library(ggplot2)
loads the ggplot2
function once it is loaded
search ()
will return the packages currently loaded. You don’t have to run this function unless you want to see packages that are loaded. Usually we don’t run this (so don’t think you have to run this each time you load a package).
# do NOT leave this line so that it runs each time you build the book!!!!!
# install.packages("ggplot2")
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
ERROR: in qplot(mtcars\(wt, mtcars\)hp): could not find function “qplot”
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] ".GlobalEnv" "package:ggplot2" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
You can also load packages using require()
. require()
returns TRUE if it’s already installed (and loads it) and FALSE if it’s not.
## [1] TRUE
## Loading required package: not_a_real_package
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'not_a_real_package'
## [1] FALSE
This is most useful if you want to check if a package is installed, install it if it’s not, and then load it.
The following example calls require(ggplot2)
. If it is installed, it loads it and returns TRUE because the package is installed. The not (!) before require means the TRUE become FALSE, so the if statement is not executed. However, if the package is not installed require(ggplot2)
returns FALSE. The not (!) makes that TRUE, so the if statement is executed. The if statement installs it and then loads it. In this way you know that the package will be installed, but it won’t install it unless it’s not already installed.
2.4 The apply family
2.4.1 Overview
- lapply()
- apply function over list or vector
- output = list
- sapply()
- apply function over list or vector
- try to simplify list to array
- vapply()
- apply function over list or vector
- explicitly specify output format
2.4.2 lapply()
lapply ([list/vector], [function])
applies the function to each item in the list or vector, but the output will always be a list.
unlist ()
can turn a list back into a vector.
How to split data!
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
# Split names from birth year
split_math <- strsplit (pioneers, split = ":")
split_math
## [[1]]
## [1] "GAUSS" "1777"
##
## [[2]]
## [1] "BAYES" "1702"
##
## [[3]]
## [1] "PASCAL" "1623"
##
## [[4]]
## [1] "PEARSON" "1857"
## [1] "list"
Use lapply ()
to change each element in the list split_math
# Convert to lowercase strings: split_low
split_low <- lapply (split_math, tolower)
# Take a look at the structure of split_low
str (split_low)
## List of 4
## $ : chr [1:2] "gauss" "1777"
## $ : chr [1:2] "bayes" "1702"
## $ : chr [1:2] "pascal" "1623"
## $ : chr [1:2] "pearson" "1857"
when you add the function in lapply ()
do NOT put parenthesis after it lapply (x, function)
. If the function has more arguments put them after another comma in lapply()
.
2.4.2.1 Anonymous function
You can also use lapply
on a function that isn’t explicitly defined. Anonymous functions are when you don’t assign your own function to a name with <-
instead just type the function where you want it. Here’s an example
## [[1]]
## [1] "gauss" "1777"
##
## [[2]]
## [1] "bayes" "1702"
##
## [[3]]
## [1] "pascal" "1623"
##
## [[4]]
## [1] "pearson" "1857"
# lapply with anonymous function that returns NULL if name has more than 5 characters
res <- lapply(split_low, function(x) {
if (nchar(x[1]) > 5) {
return(NULL)
} else {
return(x[2])
}
})
res
## [[1]]
## [1] "1777"
##
## [[2]]
## [1] "1702"
##
## [[3]]
## NULL
##
## [[4]]
## NULL
Example of what unlist()
does
## [[1]]
## [1] "1777"
##
## [[2]]
## [1] "1702"
##
## [[3]]
## NULL
##
## [[4]]
## NULL
## [1] "list"
## [1] "1777" "1702"
## [1] "character"
So it turns the list into a vector (in this case, character vector), which also removes the NULL elements
2.4.3 sapply()
Does the same thing as lapply
, except it then tries to turn the resulting list into a an array (a vector or matrix) if it can usingunlist()
. If it can’t, it leaves it as a list.
temp <- list(c(3,7,9,6,-1),c(6,9,12,13,5),c(4,8,3,-1,-3),c(1,4,7,2,-2),c(5,7,9,4,2),c(-3,5,8,9,4),c(3,6,9,4,1))
resL <- lapply(temp,min)
resL
## [[1]]
## [1] -1
##
## [[2]]
## [1] 5
##
## [[3]]
## [1] -3
##
## [[4]]
## [1] -2
##
## [[5]]
## [1] 2
##
## [[6]]
## [1] -3
##
## [[7]]
## [1] 1
## [1] "list"
## [1] -1 5 -3 -2 2 -3 1
## [1] "numeric"
If sapply
can’t turn it into a vector, it leaves it as a list. Here’s an example
# Definition of below_zero()
below_zero <- function(x) {
return(x[x < 0])
}
# Apply below_zero over temp using sapply(): freezing_s
freezing_s <- sapply(temp, below_zero)
# Apply below_zero over temp using lapply(): freezing_l
freezing_l <- lapply(temp, below_zero)
# Are freezing_s and freezing_l identical?
identical(freezing_s, freezing_l)
## [1] TRUE
identical()
can tell you if two or more vectors, matrices, or lists are the same. Because sapply
couldn’t turn the result into a vector, the result of sapply
and lapply
are identical.
Here is how it deals with NULL:
# temp is already available in the workspace
# Definition of print_info()
print_info <- function(x) {
cat("The average temperature is", mean(x), "\n")
}
# Apply print_info() over temp using sapply()
res <- sapply(temp, print_info)
## The average temperature is 4.8
## The average temperature is 9
## The average temperature is 2.2
## The average temperature is 2.4
## The average temperature is 5.4
## The average temperature is 4.6
## The average temperature is 4.6
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL
##
## [[6]]
## NULL
##
## [[7]]
## NULL
## [1] "list"
## The average temperature is 4.8
## The average temperature is 9
## The average temperature is 2.2
## The average temperature is 2.4
## The average temperature is 5.4
## The average temperature is 4.6
## The average temperature is 4.6
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL
##
## [[6]]
## NULL
##
## [[7]]
## NULL
## [1] "list"
2.4.4 vapply()
You must specify the output you are looking for. This makes vapply()
safer than sapply()
because you know what structure the function will return. If it can’t return that, it gives an error (which is good because we know something didn’t work the way we expected it to).
The two examples below demonstrate how to specify the data type vapply()
will return.
cities <- c("New York", "Paris", "London", "Tokyo","Rio", "Cape Town")
vapply(cities, nchar, numeric(1))
## New York Paris London Tokyo Rio Cape Town
## 8 5 6 5 3 9
# If you tell it to expect something numeric with length 2 instead of 1, we get an error
vapply(cities, nchar, numeric(2))
ERROR: in vapply(cities, nchar, numeric(2)): values must be length 2, ## but FUN(X[[1]]) result is length 1
first_and_last <- function(name) {
name <- gsub(" ", "", name)
letters <- strsplit(name, split = "")[[1]]
return(c(first = min(letters), last = max(letters)))
}
vapply(cities, first_and_last, character(2))
## New York Paris London Tokyo Rio Cape Town
## first "e" "a" "d" "k" "i" "a"
## last "Y" "s" "o" "y" "R" "w"
## If you tell this one to expect numeric length 2, you get an error
vapply(cities, first_and_last, numeric(2))
ERROR: in vapply(cities, first_and_last, numeric(2)): values must be type ‘double’, ## but FUN(X[[1]]) result is type ‘character’
2.5 Utilities
2.5.1 Easy & Useful Functions
rev()
reverses elementssort()
sorts elementsprint()
identical()
abs()
gives absolute value of each vector elementround()
rounds each vector element (1.1 to 1)sum()
adds up all vector elementsmean()
finds the average of all elements in a vectorlist()
creates a list of different elementslog()
a logical (gives TRUE or FALSE)ch()
a character stringseq(Arg1, arg2, by = #)
generates a sequence of numbers,arg1
tells where to start the sequence,arg2
says where to end the sequence,by
argument specifies the increment value
## [1] 1 4 7 10
rep(vector, times = #)
replicates its input (typically a vector or list),times
argument specifies how often the replication should happen. OR use theeach()
argument to specify how often each individual element should be replicated
## [1] 1 4 7 10 1 4 7 10
## [1] 1 1 4 4 7 7 10 10
sort(vector, decreasing=FALSE)
sorts an input vector by numerics, characters, or logical values by ascending order, settingdecreasing=TRUE
will reverse the order of arranging
## [1] 1 1 4 4 7 7 10 10
str()
shows content of data structures in concise wayis.*(X)
checks if data structureX
is a * (list, vector,matrix…)as.*(X)
converts data structureX
to a * (list, vector,matrix…)unlist()
converts list to vectorappend()
adds elements to a vector or list, merges vectors or listsrev()
reverses listdiff()
calculates the difference between consecutive elements
2.5.2 Regular Expressions
Uses of regular expressions:
- a sequence of characters and metacharacters that form a search pattern, which you can use to match strings
- check whether certain patterns exist in a text
- to replace patterns with R elements
- to extract certain patterns out of a string
2.5.2.1 Searching for patters (grepl, grep)
The grepl()
function: returns TRUE when a pattern is found in the corresponding character string
The grep()
function: returns a vector of indices of the character strings that contains the pattern
Both grep()
and grepl()
needs a pattern
and an x
argument, where pattern
is the regular expression you want to match for, and the x
argument is the character vector from which matches should be sought
Meta characters:
- pattern = “a” searches for any “a” anywhere
- pattern = “^a” searches for “a” only at the beginning
- pattern = “a$” searches for “a” only at the end
- pattern = “a|i” searches for “a” or “i” (i.e., | is the “or” meta character)
- A few more are listed below
- There are many others. See
?regex
(and DC has a course on regex)
Regular character add-ons to make a more robust pattern
:
@
, because a valid email must contain an at-sign..*
, which matches any character (.) zero or more times (*). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the “.edu” portion of an email address.\\.edu$
, to match the “.edu” part of the email at the end of the string. The\\
part escapes the dot: it tells R that you want to use the.
as an actual character.
Example
# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv", "my.education@gmail.com")
# Use grepl() to match for "edu"
grepl(pattern = "edu", emails)
## [1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE
## [1] 1 2 4 5 7
## [1] "john.doe@ivyleague.edu" "education@world.gov"
## [3] "invalid.edu" "quant@bigdatacollege.edu"
## [5] "my.education@gmail.com"
Correct the code so it matches only “.edu”
## [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE
# Use grep() to match for .edu addresses more robustly, save result to hits
hits <- grep(pattern = "@.*\\.edu$", emails)
# Subset emails using hits
emails[hits]
## [1] "john.doe@ivyleague.edu" "quant@bigdatacollege.edu"
2.5.2.2 Replacing (sub, gsub)
The sub(pattern = <regex>, replacement = <str>, x = <str>)
looks for a pattern
from x
and replaces the first match from that pattern with replacement
The gsub(pattern = <regex>, replacement = <str>, x = <str>)
looks for a pattern
from x
and replaces all matches from that pattern with replacement
Regular expressions used in the example below:
.*
: “any character that is matched zero or more times”.\\s
: Match a space. The “s” is normally a character (i.e., the letter “s”). “” is a space. The extra “" is the”escape” character that results in it included “”..[0-9]+
: Match the numbers 0 to 9, at least once (+).([0-9]+)
: The parentheses are used to make parts of the matching string available to define the replacement.- The
\\1
in the replacement argument ofsub()
gets set to the string that is captured by the regular expression[0-9]+
# Use sub() to convert the .edu email domains to datacamp.edu
sub(pattern="@.*\\.edu$", replacement="@datacamp.edu", x=emails)
## [1] "john.doe@datacamp.edu" "education@world.gov"
## [3] "dalai.lama@peace.org" "invalid.edu"
## [5] "quant@datacamp.edu" "cookie.monster@sesame.tv"
## [7] "my.education@gmail.com"
# Use sub() to convert all valid email domains to datacamp.edu
sub(pattern="@.*\\.*$", replacement="@datacamp.edu", x=emails)
## [1] "john.doe@datacamp.edu" "education@datacamp.edu"
## [3] "dalai.lama@datacamp.edu" "invalid.edu"
## [5] "quant@datacamp.edu" "cookie.monster@datacamp.edu"
## [7] "my.education@datacamp.edu"
Example: Awards
awards <- c("Won 1 Oscar.",
"Won 1 Oscar. Another 9 wins & 24 nominations.",
"1 win and 2 nominations.",
"2 wins & 3 nominations.",
"Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
"4 wins & 1 nomination.")
sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)
## [1] "Won 1 Oscar." "24" "2" "3" "2"
## [6] "1"
2.5.3 Dates and Times
Dates are represented by Date
objects, which store the number of days since January 1, 1970. Note that this is different from Excel, that stores dates as the number of days since January 1, 1900
# Get the current date: today
today <- Sys.Date()
# How many days since January 1, 1970?
unclass(today)
## [1] 19793
Times are represented by POSIXct
objects, which store the number of seconds since January 1st, 1970
- Sys.time()
# Get the current time: now
now <- Sys.time()
# How many seconds since January 1, 1970?
unclass(now)
## [1] 1710121043
Dates and times before January 1, 1970 are negative
as.Date()
takes a character string of the date and uses a set of symbols to format the date. Here are the symbols:
- %Y: 4-digit year (1982)
- %y: 2-digit year (82)
- %m: 2-digit month (01)
- %d: 2-digit day of the month (13)
- %A: weekday (Wednesday)
- %a: abbreviated weekday (Wed)
- %B: month (January)
- %b: abbreviated month (Jan)
The default formats are "%Y-%m-%d"
or "%Y/%m/%d"
Convert Character strings to dates:
## [1] "1982-01-13"
## [1] "1982-01-13"
## [1] "1982-01-13"
# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"
# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2 <- as.Date(str2, format = "%Y-%m-%d")
date3 <- as.Date(str3, format = "%d/%B/%Y")
Convert dates to character strings using different date notation:
## [1] "11 March, 2024"
## [1] "Today is a Monday!"
## [1] "Thursday"
## [1] "15"
## [1] "Jan 2006"
as.POSIXct()
converts a character string to a POSIXct
object
format()
converts a POSIXct
object to a character string
Symbols to format the time:
- %H: military hours as a decimal number (00-23)
- %I: hours as a decimal number (01-12)
- %M: minutes as a decimal number
- %S: seconds as a decimal number
- %T: shorthand notation for the typical format %H:%M:%S
- %p: AM/PM indicator
The default formats is "%Y-%m-%d %H:%M:%S"
Examples:
# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"
# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2, format = "%Y-%m-%d %H:%M:%S")
# Convert times to formatted strings
format(time1, "%M")
## [1] "01"
## [1] "02:23 PM"
2.5.3.1 Calculations with dates and times
## [1] "2024-03-12"
## [1] "2024-03-10"
## Time difference of 5 days
And with Times:
## [1] "2024-03-11 02:37:22 UTC"
## [1] "2024-03-10 01:37:22 UTC"
birth <- as.POSIXct("1879-03-14 14:37:23")
death <- as.POSIXct("1955-04-18 03:47:12")
einstein <- death - birth
einstein
## Time difference of 27792.55 days