R Classes
In this article, we’ll explore how to create and work with custom data structures in R using object-oriented programming (OOP). We’ll learn about two different class systems in R: S3 and S4. These systems allow us to create specialized data structures that can represent complex biological concepts, like species, populations, or ecological measurements, in a way that’s both intuitive and powerful.
Object Oriented Programming (OOP)
What are objects?
Think of objects as containers that hold both data and the operations that can be performed on that data. In biology, we often work with complex entities that have multiple characteristics and behaviors. For example, a species might have attributes like its scientific name, habitat, and population size, along with behaviors like reproduction and migration patterns.
In programming, we can represent these real-world entities as objects. An object is like a custom data structure that:
- Stores related data together (like a species’ name, habitat, and population size)
- Includes functions that work specifically with that data (like calculating population growth or checking if a species is endangered)
- Can be reused and extended to create more specific types of objects (like creating a subclass for endangered species)
This approach to programming, where we organize code around objects rather than just functions, is called Object-Oriented Programming (OOP).
Components of an object in R
In R, objects have several key components:
- Attributes and Slots – These are the data stored in the object. In S3 classes, we call them attributes, while in S4 classes, we call them slots. For example, a species object might have attributes such as:
- scientific_name
- common_name
- habitat
- population_size
- Methods – These are functions that work specifically with objects of a particular class. For example:
print()
: An in-built function which determines how the object is shown in a readable formatsummary()
: An in-built function that provides a statistical summary of the objectcalculate_growth_rate()
: A custom method to calculate population growth
- Class – The class defines what type of object it is. By extension, R will use this to understand which methods we can use with the object. Classes are typically defined as a character string, such as:
- “species”
- “population”
- “ecosystem”
R provides two different systems for creating objects: S3 and S4. S3 is simpler and more flexible, while S4 is more formal and strict. We’ll explore both systems in detail.
S3 Classes
Creation
S3 classes are R’s original object-oriented system. They’re informal and flexible, making them great for quick prototyping and simple data structures. There are two main ways to create S3 objects:
Method 1: Using the structure
function directly:
donkey <- structure(
# The attributes/slots of the object
list(
species = "Equus asinus",
age = 10,
weight = 180
),
# The class of the object
class = "animal"
)
donkey
$species
[1] "Equus asinus"
$age
[1] 10
$weight
[1] 180
attr(,"class")
[1] "animal"
This creates a simple animal object with three attributes: species, age, and weight. The structure
function takes two main arguments:
- A list containing the object’s attributes
- The class name (in this case, “animal”)
Method 2: Using a constructor function. The constructor function should follow these principles:
- It should have the prefix “new_”, followed by the name of the class (i.e.
new_myclass
). - The arguments of the function should be the attributes.
- Because S3 classes do not implement validation, this functionality should be carried out in the constructor function itself.
new_animal <- function(species, age, weight) {
# Input validation
if (!is.character(species)) stop("Species must be a character string")
if (!is.numeric(age) || age < 0) stop("Age must be a positive number")
if (!is.numeric(weight) || weight <= 0) stop("Weight must be a positive number")
# Create the object
structure(
list(
species = species,
age = age,
weight = weight
),
class = "animal"
)
}
tiger <- new_animal("Panthera tigris", age = 5, weight = 200)
tiger
tiger
$species
[1] "Panthera tigris"
$age
[1] 5
$weight
[1] 200
attr(,"class")
[1] "animal"
Indexing/Querying and Mutating
Once we’ve created an S3 object, we can access and modify its attributes using the $
operator, just like with lists or data frames. This makes S3 objects very intuitive to work with. However, this flexibility comes with a responsibility – you need to be careful about maintaining consistency in your objects.
tiger$species
tiger$age
tiger$weight
[1] "Panthera tigris"
[1] 5
[1] 200
Methods in S3 are created by following a specific naming convention: function_name.class_name
. This tells R which function to use when that function is called on an object of a particular class. For example, when we call print(tiger)
, R looks for a function named print.animal
because tiger
is of class “animal”.
Methods
Methods are functions that work specifically with objects of a particular class. In S3, methods are created by naming the function with the pattern function_name.class_name
. For example, to create a print method for our animal class:
print.animal <- function(x, ...) {
cat("Animal Object:\n")
cat("Species:", x$species, "\n")
cat("Age:", x$age, "years\n")
cat("Weight:", x$weight, "kg\n")
invisible(x)
}
# The print method has now become available for all animal objects
print(tiger)
Animal Object:
Species: Panthera tigris
Age: 5 years
Weight: 200 kg
Inheritance
Inheritance allows us to create new classes that are based on existing ones. The new class (called a subclass) inherits all the attributes and methods of the original class (called the parent class or superclass) but can add new attributes and methods of its own.
For example, we might want to create a special class for domestic animals that includes all the attributes of a regular animal plus additional information about their owner:
create_domestic_animal <- function(species, age, weight, owner) {
# Create base animal
animal <- new_animal(species, age, weight)
# Add domestic-specific attributes
animal$owner <- owner
# Set class hierarchy
class(animal) <- c("domestic_animal", "animal")
return(animal)
}
dog <- create_domestic_animal("Canis lupus familiaris", 3, 25, "John Doe")
dog
Animal Object:
Species: Canis lupus familiaris
Age: 3 years
Weight: 25 kg
Notice that the owner is not printed out. This is because R is using the default print
implementation we have written for any animal class, which does not make use of any owner attribute. To get around this, we can re-implement print
for the domestic animal subclass:
print.domestic_animal <- function(x, ...) {
cat("Domestic Animal Object:\n")
cat("Species:", x$species, "\n")
cat("Age:", x$age, "years\n")
cat("Weight:", x$weight, "kg\n")
cat("Owner:", x$owner, "\n")
}
# This will use the print.domestic_animal method instead of the print.animal method
print(dog)
Domestic Animal Object:
Species: Canis lupus familiaris
Age: 3 years
Weight: 25 kg
Owner: John Doe
S4 Classes
S4 classes are a more formal and strict object-oriented system in R. They provide better structure and type checking than S3 classes, making them ideal for complex data structures and when you need to ensure data integrity. S4 classes are particularly important when working with Bioconductor packages, which are widely used in bioinformatics.
In R, use of the S4 classes makes use of the methods
package. You can bring the functions from this package into the current environment with:
library(methods)
Or, to avoid the chances of function names clashing, we will be explicit in calling functions from this package via the ::
syntax.
Creation
Creating S4 classes is more formal than S3. We use the setClass
function to define a class, specifying:
- The name of the class
- The slots (attributes) and their types
- Default values for slots
- Validation rules to ensure data integrity
methods::setClass(
# Name of the class
"plant",
# Slots can be thought of as pre-specified attributes of the class
slots = list(
# Keys are the names of the slots;
# values are the types of the slots
species = "character",
height = "numeric",
is_flowering = "logical"
),
# The prototype argument is used to specify the default values for the slots
prototype = list(
# Keys are the names of the slots;
# values are the default values for the slots
is_flowering = TRUE
),
# the validity argument is given a function that
# is responsible for checking the validity of the object
# if the object is not valid, the function should return a string
# describing the error, and the object will not be created;
# if the object is valid, the function should return TRUE
validity = function(object) {
if (length(object@species) != 1) {
return("Species must be a single string")
}
if (object@height <= 0) {
return("Height must be positive")
}
TRUE
}
)
Once a class is defined, we create instances of the class via the new
function:
lily <- methods::new(
# The class of the object
"plant",
# The values for the slots
species = "Lilium",
height = 1.0,
is_flowering = TRUE
)
lily
An object of class "plant"
Slot "species":
[1] "Lilium"
Slot "height":
[1] 1
Slot "is_flowering":
[1] TRUE
When we create an S4 object using new
, R automatically validates the object according to our class definition. If any validation fails, R will raise an error with a descriptive message. This strict validation helps prevent errors and ensures our objects always contain valid data.
Indexing/Querying and Mutating
Unlike S3 objects, which use the $
operator, S4 objects use the @
operator to access and modify slots. This is one of the key differences between S3 and S4, and it helps enforce the stricter rules of S4 classes.
lily@species
lily@height
lily@is_flowering
[1] "Lilium"
[1] TRUE
[1] 1.5
The @
operator is a key difference between S3 and S4 classes. While S3 uses the more flexible $
operator, S4’s @
operator enforces stricter access rules. This means you can only access slots that were explicitly defined when the class was created, helping prevent accidental creation of new attributes.
lily@height <- 1.5
lily@height
[1] 1.5
When modifying S4 slots, R performs validation checks to ensure the new values are valid according to our class definition. If we try to assign an invalid value (like a negative height), R will raise an error. This strict type checking helps maintain data integrity throughout our program.
Methods
S4 methods are defined using the setMethod
function, which requires:
- The name of the method
- The class it applies to
- The function that implements the method
# Defining a show method (equivalent to print)
methods::setMethod(
"show", # The name of the method
"plant", # The class that the method is for
# The function that defines the method
function(object) {
cat("Plant Object:\n")
cat("Species:", object@species, "\n")
cat("Height:", object@height, "m\n")
cat("Flowering:", ifelse(object@is_flowering, "Yes", "No"), "\n")
}
)
# Define a summary method
methods::setMethod(
"summary",
"plant",
function(object, ...) {
cat("Plant Summary:\n")
cat("Species:", object@species, "\n")
cat("Height:", object@height, "m\n")
cat("Flowering:", ifelse(object@is_flowering, "Yes", "No"), "\n")
cat("Size Category:", ifelse(object@height < 5, "Small",
ifelse(object@height < 15, "Medium", "Large")
), "\n")
}
)
# The show method has now become available for all plant objects
lily
Plant Object:
Species: Lilium
Height: 1.5 m
Flowering: Yes
Inheritance
S4 inheritance is more formal than S3. When creating a subclass, we use the contains
argument to specify the parent class. The subclass inherits all slots and methods from the parent class but can add new slots and methods of its own.
methods::setClass(
"tree", # The name of the subclass
contains = "plant", # The name of the parent class
slots = list(
trunk_diameter = "numeric"
),
validity = function(object) {
if (object@trunk_diameter <= 0) {
return("Trunk diameter must be positive")
}
TRUE
}
)
As with S3 classes, we can override the methods for the subclass. We use the setMethod
function to do this:
methods::setMethod(
"show", # The name of the method
"tree", # The class that the method is for
# The function that defines the method
function(object) {
methods::callNextMethod() # Call the plant show method first
cat("Trunk Diameter:", object@trunk_diameter, "m\n")
}
)
maple <- methods::new("tree",
species = "Acer platanoides",
height = 20,
is_flowering = TRUE,
trunk_diameter = 0.5
)
show(maple)
Plant Object:
Species: Acer platanoides
Height: 20 m
Flowering: Yes
Trunk Diameter: 0.5 m
When we call show(maple)
, R first looks for a method specifically defined for the “tree” class. Since we defined one, it uses that method, which first calls the parent class’s show method (using callNextMethod()
) and then adds the tree-specific information about trunk diameter.
Generic Functions
Generic functions are a powerful feature of R’s object-oriented systems. They allow the same function name to work differently depending on the class of the object it’s applied to. This is called method dispatch, and it makes your code more intuitive and flexible.
Single Argument Generics
Let’s look at how to create a generic function that works with a single argument. We’ll create a function to calculate the volume of our plants and trees:
# Defining a generic function
methods::setGeneric(
# The name of the generic function
"calculate_volume",
# A placeholder that will be replaced by the method
function(object) methods::standardGeneric("calculate_volume")
)
R is now expecting a method to be defined for this “calculate_volume” generic function. Let’s do so for our plant and tree classes:
# Defining an implementation of the generic function for plant objects
methods::setMethod(
# The name of the method
"calculate_volume",
# The "signature" of the method; aka the types of the arguments
# In this case, the method is for plant objects
"plant",
# The function that defines the method
function(object) {
cat("Calculating volume for a plant...\n")
# Assumption of a thin cylinder for the volume
pi * 0.01^2 * object@height
}
)
# Defining an implementation of the generic function for tree objects
methods::setMethod(
"calculate_volume",
"tree",
function(object) {
cat("Calculating volume for a tree...\n")
pi * (object@trunk_diameter / 2)^2 * object@height
}
)
Now we can use the generic function to calculate the volume of our plant and tree objects:
calculate_volume(lily)
Calculating volume for a plant...
[1] 0.0004712389
calculate_volume(maple)
Calculating volume for a tree...
[1] 3.926991
R dispatches the correct implementation automatically!
Multiple Argument Generics
Generic functions can also work with multiple arguments. This is particularly useful when you want to define operations between objects of the same class. For example, we might want to check if two plants are different species:
# Defining a generic function with multiple arguments
methods::setGeneric(
"are_different_species",
function(plant1, plant2) methods::standardGeneric("are_different_species")
)
# Defining a method for plant-plant combination
methods::setMethod(
# The name of the method
"are_different_species",
# The "signature" of the method; aka the types of the arguments,
# If more than one arguments exist, we need to use the
# signature() function to specify the types of the arguments
signature(plant1 = "plant", plant2 = "plant"),
# The function that defines the method
function(plant1, plant2) {
plant1@species != plant2@species
}
)
rose <- methods::new("plant",
species = "Rosa",
height = 0.5,
is_flowering = TRUE
)
are_different_species(rose, lily)
[1] TRUE
are_different_species(rose, rose)
[1] FALSE
The setGeneric
function creates a new generic function that can work with different classes. It defines a placeholder that will be replaced by the appropriate method implementation based on the class of the object it’s called with. This is the foundation of R’s method dispatch system.
Generics on R objects
One of the most powerful features of R’s object-oriented systems is that you can extend existing R functions to work with your custom classes. For example, we can create a method to extract the slope from a linear model:
# Defining a generic function for a lm object
methods::setGeneric(
"get_slope",
function(x, ...) methods::standardGeneric("get_slope")
)
# Implementing the generic function for a lm object
methods::setMethod(
"get_slope",
"lm",
function(x, ...) {
x$coefficients[2]
}
)
example_lm <- lm(mpg ~ hp, data = mtcars)
get_slope(example_lm)
hp
-0.06822828
This demonstrates how we can extend existing R objects to work with custom functionality. In fact, this is often how how third party packages allow for base R objects to work their their functions.
Concluding Remarks
- Use S3 for simple data structures and quick prototyping. Attributes/slots are accessed here using the
$
operator. - Use S4 when you need strict type checking and data integrity and/or when you need multiple dispatch of methods. Attributes/slots are accessed here using the
@
operator. - Third party packages such as those found in Bioconductor will often use the S4 series of class objects.
Remember that the best approach is often to start with S3 for simplicity and move to S4 if you need more structure and type safety.