Skip to content

R Classes

In this article, we’ll explore how to create and work with custom data structures in R using object-oriented programming (OOP). We’ll learn about two different class systems in R: S3 and S4. These systems allow us to create specialized data structures that can represent complex biological concepts, like species, populations, or ecological measurements, in a way that’s both intuitive and powerful.

Object Oriented Programming (OOP)

What are objects?

Think of objects as containers that hold both data and the operations that can be performed on that data. In biology, we often work with complex entities that have multiple characteristics and behaviors. For example, a species might have attributes like its scientific name, habitat, and population size, along with behaviors like reproduction and migration patterns.

In programming, we can represent these real-world entities as objects. An object is like a custom data structure that:

  • Stores related data together (like a species’ name, habitat, and population size)
  • Includes functions that work specifically with that data (like calculating population growth or checking if a species is endangered)
  • Can be reused and extended to create more specific types of objects (like creating a subclass for endangered species)

This approach to programming, where we organize code around objects rather than just functions, is called Object-Oriented Programming (OOP).

Components of an object in R

In R, objects have several key components:

  • Attributes and Slots – These are the data stored in the object. In S3 classes, we call them attributes, while in S4 classes, we call them slots. For example, a species object might have attributes such as:
    • scientific_name
    • common_name
    • habitat
    • population_size
  • Methods – These are functions that work specifically with objects of a particular class. For example:
    • print(): An in-built function which determines how the object is shown in a readable format
    • summary(): An in-built function that provides a statistical summary of the object
    • calculate_growth_rate(): A custom method to calculate population growth
  • Class – The class defines what type of object it is. By extension, R will use this to understand which methods we can use with the object. Classes are typically defined as a character string, such as:
    • “species”
    • “population”
    • “ecosystem”

R provides two different systems for creating objects: S3 and S4. S3 is simpler and more flexible, while S4 is more formal and strict. We’ll explore both systems in detail.

S3 Classes

Creation

S3 classes are R’s original object-oriented system. They’re informal and flexible, making them great for quick prototyping and simple data structures. There are two main ways to create S3 objects:

Method 1: Using the structure function directly:

donkey <- structure(
  # The attributes/slots of the object
  list(
    species = "Equus asinus",
    age = 10,
    weight = 180
  ),
  # The class of the object
  class = "animal"
)
donkey
$species
[1] "Equus asinus"

$age
[1] 10

$weight
[1] 180

attr(,"class")
[1] "animal"

This creates a simple animal object with three attributes: species, age, and weight. The structure function takes two main arguments:

  • A list containing the object’s attributes
  • The class name (in this case, “animal”)

Method 2: Using a constructor function. The constructor function should follow these principles:

  1. It should have the prefix “new_”, followed by the name of the class (i.e. new_myclass).
  2. The arguments of the function should be the attributes.
  3. Because S3 classes do not implement validation, this functionality should be carried out in the constructor function itself.
new_animal <- function(species, age, weight) {
  # Input validation
  if (!is.character(species)) stop("Species must be a character string")
  if (!is.numeric(age) || age < 0) stop("Age must be a positive number")
  if (!is.numeric(weight) || weight <= 0) stop("Weight must be a positive number")

  # Create the object
  structure(
    list(
      species = species,
      age = age,
      weight = weight
    ),
    class = "animal"
  )
}

tiger <- new_animal("Panthera tigris", age = 5, weight = 200)
tiger
tiger
$species
[1] "Panthera tigris"

$age
[1] 5

$weight
[1] 200

attr(,"class")
[1] "animal"

Indexing/Querying and Mutating

Once we’ve created an S3 object, we can access and modify its attributes using the $ operator, just like with lists or data frames. This makes S3 objects very intuitive to work with. However, this flexibility comes with a responsibility – you need to be careful about maintaining consistency in your objects.

tiger$species
tiger$age
tiger$weight
[1] "Panthera tigris"
[1] 5
[1] 200

Methods in S3 are created by following a specific naming convention: function_name.class_name. This tells R which function to use when that function is called on an object of a particular class. For example, when we call print(tiger), R looks for a function named print.animal because tiger is of class “animal”.

Methods

Methods are functions that work specifically with objects of a particular class. In S3, methods are created by naming the function with the pattern function_name.class_name. For example, to create a print method for our animal class:

print.animal <- function(x, ...) {
  cat("Animal Object:\n")
  cat("Species:", x$species, "\n")
  cat("Age:", x$age, "years\n")
  cat("Weight:", x$weight, "kg\n")
  invisible(x)
}

# The print method has now become available for all animal objects
print(tiger)
Animal Object:
Species: Panthera tigris 
Age: 5 years
Weight: 200 kg

Inheritance

Inheritance allows us to create new classes that are based on existing ones. The new class (called a subclass) inherits all the attributes and methods of the original class (called the parent class or superclass) but can add new attributes and methods of its own.

For example, we might want to create a special class for domestic animals that includes all the attributes of a regular animal plus additional information about their owner:

create_domestic_animal <- function(species, age, weight, owner) {
  # Create base animal
  animal <- new_animal(species, age, weight)

  # Add domestic-specific attributes
  animal$owner <- owner

  # Set class hierarchy
  class(animal) <- c("domestic_animal", "animal")

  return(animal)
}

dog <- create_domestic_animal("Canis lupus familiaris", 3, 25, "John Doe")
dog
Animal Object:
Species: Canis lupus familiaris 
Age: 3 years
Weight: 25 kg

Notice that the owner is not printed out. This is because R is using the default print implementation we have written for any animal class, which does not make use of any owner attribute. To get around this, we can re-implement print for the domestic animal subclass:

print.domestic_animal <- function(x, ...) {
  cat("Domestic Animal Object:\n")
  cat("Species:", x$species, "\n")
  cat("Age:", x$age, "years\n")
  cat("Weight:", x$weight, "kg\n")
  cat("Owner:", x$owner, "\n")
}

# This will use the print.domestic_animal method instead of the print.animal method
print(dog)
Domestic Animal Object:
Species: Canis lupus familiaris 
Age: 3 years
Weight: 25 kg
Owner: John Doe 

S4 Classes

S4 classes are a more formal and strict object-oriented system in R. They provide better structure and type checking than S3 classes, making them ideal for complex data structures and when you need to ensure data integrity. S4 classes are particularly important when working with Bioconductor packages, which are widely used in bioinformatics.

In R, use of the S4 classes makes use of the methods package. You can bring the functions from this package into the current environment with:

library(methods)

Or, to avoid the chances of function names clashing, we will be explicit in calling functions from this package via the :: syntax.

Creation

Creating S4 classes is more formal than S3. We use the setClass function to define a class, specifying:

  • The name of the class
  • The slots (attributes) and their types
  • Default values for slots
  • Validation rules to ensure data integrity
methods::setClass(
  # Name of the class
  "plant",

  # Slots can be thought of as pre-specified attributes of the class
  slots = list(
    # Keys are the names of the slots;
    # values are the types of the slots
    species = "character",
    height = "numeric",
    is_flowering = "logical"
  ),

  # The prototype argument is used to specify the default values for the slots
  prototype = list(
    # Keys are the names of the slots;
    # values are the default values for the slots
    is_flowering = TRUE
  ),

  # the validity argument is given a function that
  # is responsible for checking the validity of the object
  # if the object is not valid, the function should return a string
  # describing the error, and the object will not be created;
  # if the object is valid, the function should return TRUE
  validity = function(object) {
    if (length(object@species) != 1) {
      return("Species must be a single string")
    }
    if (object@height <= 0) {
      return("Height must be positive")
    }
    TRUE
  }
)

Once a class is defined, we create instances of the class via the new function:

lily <- methods::new(
  # The class of the object
  "plant",

  # The values for the slots
  species = "Lilium",
  height = 1.0,
  is_flowering = TRUE
)
lily
An object of class "plant"
Slot "species":
[1] "Lilium"

Slot "height":
[1] 1

Slot "is_flowering":
[1] TRUE

When we create an S4 object using new, R automatically validates the object according to our class definition. If any validation fails, R will raise an error with a descriptive message. This strict validation helps prevent errors and ensures our objects always contain valid data.

Indexing/Querying and Mutating

Unlike S3 objects, which use the $ operator, S4 objects use the @ operator to access and modify slots. This is one of the key differences between S3 and S4, and it helps enforce the stricter rules of S4 classes.

lily@species
lily@height
lily@is_flowering
[1] "Lilium"
[1] TRUE
[1] 1.5

The @ operator is a key difference between S3 and S4 classes. While S3 uses the more flexible $ operator, S4’s @ operator enforces stricter access rules. This means you can only access slots that were explicitly defined when the class was created, helping prevent accidental creation of new attributes.

lily@height <- 1.5
lily@height
[1] 1.5

When modifying S4 slots, R performs validation checks to ensure the new values are valid according to our class definition. If we try to assign an invalid value (like a negative height), R will raise an error. This strict type checking helps maintain data integrity throughout our program.

Methods

S4 methods are defined using the setMethod function, which requires:

  • The name of the method
  • The class it applies to
  • The function that implements the method
# Defining a show method (equivalent to print)
methods::setMethod(
  "show", # The name of the method
  "plant", # The class that the method is for

  # The function that defines the method
  function(object) {
    cat("Plant Object:\n")
    cat("Species:", object@species, "\n")
    cat("Height:", object@height, "m\n")
    cat("Flowering:", ifelse(object@is_flowering, "Yes", "No"), "\n")
  }
)

# Define a summary method
methods::setMethod(
  "summary",
  "plant",
  function(object, ...) {
    cat("Plant Summary:\n")
    cat("Species:", object@species, "\n")
    cat("Height:", object@height, "m\n")
    cat("Flowering:", ifelse(object@is_flowering, "Yes", "No"), "\n")
    cat("Size Category:", ifelse(object@height < 5, "Small",
      ifelse(object@height < 15, "Medium", "Large")
    ), "\n")
  }
)
# The show method has now become available for all plant objects
lily
Plant Object:
Species: Lilium 
Height: 1.5 m
Flowering: Yes 

Inheritance

S4 inheritance is more formal than S3. When creating a subclass, we use the contains argument to specify the parent class. The subclass inherits all slots and methods from the parent class but can add new slots and methods of its own.

methods::setClass(
  "tree", # The name of the subclass
  contains = "plant", # The name of the parent class
  slots = list(
    trunk_diameter = "numeric"
  ),
  validity = function(object) {
    if (object@trunk_diameter <= 0) {
      return("Trunk diameter must be positive")
    }
    TRUE
  }
)

As with S3 classes, we can override the methods for the subclass. We use the setMethod function to do this:

methods::setMethod(
  "show", # The name of the method
  "tree", # The class that the method is for

  # The function that defines the method
  function(object) {
    methods::callNextMethod() # Call the plant show method first
    cat("Trunk Diameter:", object@trunk_diameter, "m\n")
  }
)

maple <- methods::new("tree",
  species = "Acer platanoides",
  height = 20,
  is_flowering = TRUE,
  trunk_diameter = 0.5
)
show(maple)
Plant Object:
Species: Acer platanoides 
Height: 20 m
Flowering: Yes 
Trunk Diameter: 0.5 m

When we call show(maple), R first looks for a method specifically defined for the “tree” class. Since we defined one, it uses that method, which first calls the parent class’s show method (using callNextMethod()) and then adds the tree-specific information about trunk diameter.

Generic Functions

Generic functions are a powerful feature of R’s object-oriented systems. They allow the same function name to work differently depending on the class of the object it’s applied to. This is called method dispatch, and it makes your code more intuitive and flexible.

Single Argument Generics

Let’s look at how to create a generic function that works with a single argument. We’ll create a function to calculate the volume of our plants and trees:

# Defining a generic function
methods::setGeneric(
  # The name of the generic function
  "calculate_volume",

  # A placeholder that will be replaced by the method
  function(object) methods::standardGeneric("calculate_volume")
)

R is now expecting a method to be defined for this “calculate_volume” generic function. Let’s do so for our plant and tree classes:

# Defining an implementation of the generic function for plant objects
methods::setMethod(
  # The name of the method
  "calculate_volume",

  # The "signature" of the method; aka the types of the arguments
  # In this case, the method is for plant objects
  "plant",

  # The function that defines the method
  function(object) {
    cat("Calculating volume for a plant...\n")
    # Assumption of a thin cylinder for the volume
    pi * 0.01^2 * object@height
  }
)

# Defining an implementation of the generic function for tree objects
methods::setMethod(
  "calculate_volume",
  "tree",
  function(object) {
    cat("Calculating volume for a tree...\n")
    pi * (object@trunk_diameter / 2)^2 * object@height
  }
)

Now we can use the generic function to calculate the volume of our plant and tree objects:

calculate_volume(lily)
Calculating volume for a plant...
[1] 0.0004712389
calculate_volume(maple)
Calculating volume for a tree...
[1] 3.926991

R dispatches the correct implementation automatically!

Multiple Argument Generics

Generic functions can also work with multiple arguments. This is particularly useful when you want to define operations between objects of the same class. For example, we might want to check if two plants are different species:

# Defining a generic function with multiple arguments
methods::setGeneric(
  "are_different_species",
  function(plant1, plant2) methods::standardGeneric("are_different_species")
)

# Defining a method for plant-plant combination
methods::setMethod(
  # The name of the method
  "are_different_species",

  # The "signature" of the method; aka the types of the arguments,
  # If more than one arguments exist, we need to use the
  # signature() function to specify the types of the arguments
  signature(plant1 = "plant", plant2 = "plant"),

  # The function that defines the method
  function(plant1, plant2) {
    plant1@species != plant2@species
  }
)

rose <- methods::new("plant",
  species = "Rosa",
  height = 0.5,
  is_flowering = TRUE
)
are_different_species(rose, lily)
[1] TRUE
are_different_species(rose, rose)
[1] FALSE

The setGeneric function creates a new generic function that can work with different classes. It defines a placeholder that will be replaced by the appropriate method implementation based on the class of the object it’s called with. This is the foundation of R’s method dispatch system.

Generics on R objects

One of the most powerful features of R’s object-oriented systems is that you can extend existing R functions to work with your custom classes. For example, we can create a method to extract the slope from a linear model:

# Defining a generic function for a lm object
methods::setGeneric(
  "get_slope",
  function(x, ...) methods::standardGeneric("get_slope")
)

# Implementing the generic function for a lm object
methods::setMethod(
  "get_slope",
  "lm",
  function(x, ...) {
    x$coefficients[2]
  }
)

example_lm <- lm(mpg ~ hp, data = mtcars)
get_slope(example_lm)
         hp 
-0.06822828 

This demonstrates how we can extend existing R objects to work with custom functionality. In fact, this is often how how third party packages allow for base R objects to work their their functions.

Concluding Remarks

  • Use S3 for simple data structures and quick prototyping. Attributes/slots are accessed here using the $ operator.
  • Use S4 when you need strict type checking and data integrity and/or when you need multiple dispatch of methods. Attributes/slots are accessed here using the @ operator.
  • Third party packages such as those found in Bioconductor will often use the S4 series of class objects.

Remember that the best approach is often to start with S3 for simplicity and move to S4 if you need more structure and type safety.