Skip to content

Reading and Writing Files in R

Quickly after beginning to use R, as a biologist it is likely you will want to analyze some data. These data will be often stored as external files in your computer, with the most common data file types being Excel Spreadsheet files (.xls(x)), Comma Separated Value files (.csv), or Plain Text (.txt) files.

Since it is best practice to never directly edit your raw data files (because what if you need to go back to the original version!?), a common workflow is to:

  1. Read your data into R
  2. Perform some kind of analysis, creating new data objects, figures, or outputs, then
  3. Save (aka “write”) those data objects/figures/outputs to your computer for later use

Steps 1 & 3 will be covered here.

There are, as there are with all things programming, multiple ways to perform these tasks in both base R and via packages.

Reading Files

When reading in a new file, after manually checking it and ensuring it is formatted in a readable way, note the path in your computer where the file is present. NOTE see section on workflow and the here package.

Comma-Separated Value .csv Files

This is the recommended file type for reading data into R as it is flexible and easily transferred directly into a dataframe.

We can read in the file by assinging it directly to an object.

Base R

data <- read.csv("/path/to/file/file-name.csv", header = TRUE)

Here the first argument in the read.csv() function is the path pointing to the file in your computer. The second argument specifies header = TRUE> which tells R that the first row of the .csvfile will be the column names, not part of the data themselves.

Tidyverse

data <- readr::read_csv("/path/to/file/file-name.csv")

Excel sheets .xls(x)

The best way to read in an excel sheet into R with with the readxl package. Almost identical to above:

data <- readxl::read_xls("/path/to/file/file-name.xls", sheet = "SHEETNAME")

Plain Text .txt

Reading a text file is similar to the approaches to a csv file:

data <- read.delim("/path/to/file/file-name.txt")

This above option assumes that the file it tab-delimited.

We could also use the function read.table() which is flexible in that we could define what type of delimiter the file uses. For example, a space-delimited text file is a data file similar to a .csv but instead of using a comma, a tab, or some other delimiter, it simply uses spaces. We could use read.table() for this task like so:

data <- read.table("/path/to/file/file-name.txt", sep = " ")

And we could extrapolate this to other less common types of delimiters supported by the function as well.

Tidyverse

data <- readr::read_tsv("/path/to/file/file-name.txt")

Writing Files

Writing files is nearly identical to reading files.

Comma-Separated Value .csv Files

Base R

write.csv(data_object, "/path/to/new/file/file-name.csv")

Tidyverse

readr::write_csv(data_object, "/path/to/new/file/file-name.csv")

Excel sheets .xls(x)

Writing a file to an excel sheet is slightly unusual and wouldn’t be recommended (.csv files are always better), but if you need to, there exists a number of packages to assist, the easiest of which being writexl.

library(writexl)
writexl::write_xlsx(data_object, "/path/to/new/file/file-name.xlsx")

Plain Text .txt

Base R

write.table(data_object, "/path/to/new/file/file-name.txt")

Tidyverse

readr::write_delim(data_object, "/path/to/new/file/file-name.txt")

Writing Figures

It is common to create a figure in R and want to save it to the local machine. If you are plotting in ggplot2, then there are specific functions for those objects, but if you are plotting in base R, you will use built-in functions to save as whatever file type you want.

Using the example built-in dataset airquality, we can make an example plot to save.

Saving an image created in base R

plot(x = airquality$Day, y = airquality$Temp, col = "red")

If this is the plot we wish to save, we can save it as any common figure file type (e.g. .png, .jpg, .tif, .pdf etc.)

# first use the command for the file type 
png(file="/path/to/plot/saving_plot.png")
# then we create the plot
hist(Temperature, col="darkgreen")
# this command saves the file and closes the connection to the file
dev.off()

Above, essentially the first line png(file="/path/to/plot/saving_plot.png") opens the connection to a file (aka makes a place for it in the computer), then the plot function hist(Temperature, col="darkgreen") actually makes the plot, then the last line dev.off() writes the actual file and closes the connection opened by the first line of code.

Saving a ggplot2 object

It is best to use the let of functions from ggplot2 to save these plots.

ggplot2::ggsave("/path/to/plot/saving_plot.png",
                plot_object)

Note here that we can change the type of file that gets saved by simply changing the extension on the file name.

Reading & Writing Examples

To get some practice with reading and writing files and to ensure that you have an understanding of the process, let’s use a built in dataframe that comes shipped with R, to practice reading and writing files.

We’ll select a built-in dataframe, write it to our computers, and then read it back in. Let’s use the Iris dataset.

df <- iris
head(df)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Now that we have our dataframe df, let’s write this out to a place on our computers. Here is the tricky bit.. As we mentioned in the workflow section, everyone’s computer will have slightly different “paths”, aka set of folders and directions that the computer uses to point to a particular file in a particular place. On my computer, I’ll be using a simple location in my documents folder. For me, that location is “~/Documents/iris.csv”, but it will be slightly different on each operating system (e.g. Windows vs. MacOSx). So, go ahead now and find the path that you want to save to. If you’re unclear still on paths, you can check out the “Working Directories” section on the workflow page

We’ll use the Tidyverse method with the readr package, so we need to load this first so we can use it. If you don’t have it installed yet, use install.packages("readr") to install it.

library(readr)

Let’s write out our file:

readr::write_csv(df, "~/Documents/iris.csv")

This returns no message which is a good sign, so to check it worked, we can manually on our computers go to that location, but better yet, lets try reading in the file and see if it works!

We’ll assign it a different name when we read it in for clarity.

iris_df <- readr::read_csv("~/Documents/iris.csv")
## Rows: 150 Columns: 5
## ── Column specification ──────────────────────────────────────────────
## Delimiter: ","
## chr (1): Species
## dbl (4): Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

And we see the file has been read into our R session.

If the file hasn’t been read in, DO NOT PANIC. The most likely source of the error is a misspelling. Double check that you have spelt the paths correctly in both your writing and reading steps, and manually check in your computer to see if you can find the file where you think you saved it. You might have accidentally saved it elsewhere!

Here’s an example of a common error we might encounter during this type of task:

iris_df <- readr::read_csv("~/Docoments/iris.csv")
## Error: '~/Docoments/iris.csv' does not exist.

We see that no file was read in! How come? Can you spot the error in the above code? It turns out, “Documents” is misspelt as “Docoments”. This type of small and easily solved error is the nmost common when reading and writing files.

Once we’ve read in our data, it’s almost always important to take a quick look at our data to make sure it looks as we expect it to. There’s two easy ways to do so. The first way is just to look at the first 6 rows of data using the head() function:

head(iris_df)
## # A tibble: 6 × 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa 
## 6          5.4         3.9          1.7         0.4 setosa

The other useful thing to do is double check the structure of the data we’re working with, by calling the str() function, which will give us the structure of our data object, including of the object itself (i.e. is it a DataFrame, a matrix, a list?), as well as the sub-components. For a DataFrame, we’ll see what type of data each column is. This is important since particular methods only work on columns of particular types (e.g. you can’t take a numerical mean of a character column).

str(iris_df)
## spc_tbl_ [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Sepal.Length = col_double(),
##   ..   Sepal.Width = col_double(),
##   ..   Petal.Length = col_double(),
##   ..   Petal.Width = col_double(),
##   ..   Species = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>