4 min read

R S3 OOP & Dplyr

Introduction

There are three different popular OOP systems available to R, S3 is the oldest. S3 will also be the most unrecongisable to developers familiar with OOP in other in type OOP languages like Java and C++. It has a little more in common with OO techniques in legacy Javascript (pre Typescript) and PERL.

While S3 might be the weakest of the three in terms of purist OOP, it is probably the most important of the three in terms of the R language, toolchain and ecosystem.

While R did not set out to be a Object Oriented programming language, the importance of S3 can be confirmed by looking at which of OOP systems are used by the most popular R packages. Dplyr for example uses S3.

Lets figure out how S3 works and if we can can use it with Dplyr in our own subclasses.

library(dplyr)
library(sloop)
aapl_stock <- tribble(
    ~symbol, ~date, ~open, ~high, ~low, ~close, ~volume, ~adjusted,
    "AAPL", "2021-07-30",  144.,  146.,  144.,  146., 70382000, 146.,
    "AAPL", "2021-08-02",  146.,  147.,  145.,  146., 62880000, 145.)


# The class() 
# What class(es) if any does a tibble inherit?
class(aapl_stock)
## [1] "tbl_df"     "tbl"        "data.frame"
# What attributes does a tibble posses?
attributes(aapl_stock)
## $names
## [1] "symbol"   "date"     "open"     "high"     "low"      "close"    "volume"  
## [8] "adjusted"
## 
## $row.names
## [1] 1 2
## 
## $class
## [1] "tbl_df"     "tbl"        "data.frame"
## How does the method dispatch function for a typical dplyr method?
s3_dispatch(mutate(aapl_stock))
##    mutate.tbl_df
##    mutate.tbl
## => mutate.data.frame
##    mutate.default

Use Case

So far we have confirmed that dplyr tibbles and functions are built on S3.

Tidyquant is a R package used to fetch and process trading and economic data in tibble format. It produces regular dplyr tibbles with no domain specific subclassing, which look just like the tibble we’ve already created.

A subclass might be useful to store metadata. For example, with a tibble holding stock price data it might be nice to associate - Data Source (Yahoo|Alpha Vantage) - Stock Exchange (NYSE|Nasdaq|LSE|Nikkei) etc.

For this subclass to be useful, we should be able to use out of the box dplyr methods to process the subclass without losing our subclass type and attributes. Lets try it.

# Add custom class and attributes.
# As this is experiment, we do this in ad hoc fashion.
# Normally we would create a constructor.

class(aapl_stock) <- c("stock", class(aapl_stock))
class(aapl_stock)
## [1] "stock"      "tbl_df"     "tbl"        "data.frame"
attr(aapl_stock, "exchange") <- "NYSE"


# But will the customisation survive typical tidy dplyr functions?

review_stock <- function(obj) {
    print(paste("is custom subclass stock? ",  "stock" %in% class(obj))) # our custom subclass is maintaned
    print(paste("has custom exchange attribute?", attr(obj, "exchange") )) # out custom subclass attribute is maintained
}

mtcars |> review_stock() # Sanity check test with a negative example
## [1] "is custom subclass stock?  FALSE"
## [1] "has custom exchange attribute? "
aapl_stock |> head(1)  |> review_stock() 
## [1] "is custom subclass stock?  TRUE"
## [1] "has custom exchange attribute? NYSE"
aapl_stock |> mutate(morecols = "yay")  |> review_stock() 
## [1] "is custom subclass stock?  TRUE"
## [1] "has custom exchange attribute? NYSE"

Generic functions

# Lets create a stock specific version of the "generic" print() function
print.stock <- function(obj) {
    cat(paste("Exchange: ", attr(obj, "exchange"), "\n"))
    cat(paste("Source: ", attr(obj, "source"), "\n"))
    cat("Prices: ")
    NextMethod()
}

aapl_stock |> print()
## Exchange:  NYSE 
## Source:   
## Prices: # A tibble: 2 x 8
##   symbol date        open  high   low close   volume adjusted
##   <chr>  <chr>      <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1 AAPL   2021-07-30   144   146   144   146 70382000      146
## 2 AAPL   2021-08-02   146   147   145   146 62880000      145

The Constructor

new_stock_tbl <- function(data, exchange = NULL, source = "yahoo", class = NULL) {
    if (is.null(class)) {
        class(data) <- c("stock", class(data))
    } else {
        class(data) <- class
    }
    attr(data, "exchange") <- exchange
    attr(data, "source") <- source
    return(data)
}

aapl <- tribble(
    ~symbol, ~date, ~open, ~high, ~low, ~close, ~volume, ~adjusted,
    "AAPL", "2021-07-30",  144.,  146.,  144.,  146., 70382000, 146.,
    "AAPL", "2021-08-02",  146.,  147.,  145.,  146., 62880000, 145.) |>
new_stock_tbl(exchange = "NYSE")

print(aapl)
## Exchange:  NYSE 
## Source:  yahoo 
## Prices: # A tibble: 2 x 8
##   symbol date        open  high   low close   volume adjusted
##   <chr>  <chr>      <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1 AAPL   2021-07-30   144   146   144   146 70382000      146
## 2 AAPL   2021-08-02   146   147   145   146 62880000      145

Conclusion

  • S3 and Tidy/Dplyr work together. Meaning we can create subclasses, and process the subclasses with dplyr, without losing the subclass specialisation.
  • S3 is easy to use.
  • S3 is prevelant throughout the packages and tools used by a typical R program and developer.