Tutorial Aims:

  1. Understand what RMarkdown is and why you should use it
  2. Learn how to construct a RMarkdown file
  3. Export an RMarkdown file into many file formats

1. What is R Markdown?

R Makdown allows you to create documents that serve as a neat record of your analysis. In the world of reproducible research, we want other researchers to easily understand what we did in our analysis, otherwise nobody can be certain that you analysed your data properly. You might choose to create an RMarkdown document as an appendix to a paper or project assignment that you are doing, upload it to an online repository such as Github, or simply to keep as a personal record so you can quickly look back at your code and see what you did. RMarkdown presents your code alongside its output (graphs, tables, etc.) with conventional text to explain it, a bit like a notebook.

R markdown makes use of Markdown syntax. Markdown is a very simple ‘markup’ language which provides methods for creating documents with headers, images, links etc. from plain text files, while keeping the original plain text file easy to read. You can convert Markdown documents to many other file types like .html or .pdf to display the headers, images etc.

When you use an RMarkdown file (.Rmd), you can use conventional Markdown syntax alongside chunks of code written in R (or other programming languages!). When you knit the RMarkdown file, the Markdown formating and the R code are evaluated, and an ouput file (HTML, PDF, etc) is produced.

Why Use R Markdown

  • Everything is in one place.
  • Documents with embedded code are reproducible
  • The document will serve as a record for how you arrived at the results you include in your papers/presentations
  • You can pass on your code to readers in addition to the report content
  • Documents can also be used for future data releases and/or different subsets of data

2. Download R Markdown

To get RMarkdown working in RStudio, the first thing you need is the rmarkdown package, which you can get from CRAN by running the following commands in R or RStudio:

#install.packages('rmarkdown')

Then load the package into your project

library(rmarkdown)

3. Create an RMarkdown file

To create a new RMarkdown file (.Rmd), select File -> New File -> R Markdown in RStudio, then choose the file type you want to create. For now we will focus on a .html Document, which can be easily converted to other types later.

The newly created .Rmd file comes with basic instructions, but we want to create our own RMarkdown script, so go ahead and delete everything in the example file.

4. The YAML Header

At the top of any RMarkdown script is a YAML header section enclosed by . By default this includes a title, author, date and the file type you want to output to. Rules in the header section will alter the whole document. Have a flick through quickly to familiarise yourself with the sort of things you can alter by adding an option to the YAML header.

Through YAML you can set: - font (size and style) - default figure options (height, width, etc) - reference custom CSS (Cascading Style Sheets) code

By default, the title, author, date and output format are printed at the top of your .html document. This is the minimum you should put in your header section.

Now that we have our first piece of content, we can test the .Rmd file by compiling it to .html. To compile your .Rmd file into a .html document, you should press the Knit button in the taskbar.

Knit button

By default, RStudio opens a separate preview window to display the output of your .Rmd file. If you want the output to be displayed in the Viewer window in RStudio (the same window where you would see plotted figures/ packages/ file paths), select “View in Pane” from the drop down menu that appears when you click on the Knit button in the taskbar, or in the Settings gear icon drop down menu next to the Knit button.

A preview appears, and a .html file is also saved to the same folder where you saved your .Rmd file.

4. Markdown Syntax

You can use regular markdown rules in your R Markdown document. Once you knit your document, the output will display text formatted according to the following simple rules.

Formatting Text

Here are a few common formatting commands:

*Italic* or _italics_ : Italic

**Bold** or __ Bold __: Bold

~~strikethrough~~ : strikethrough

superscript\^2 : superscript^2

subscript\^\~2\~ : subscript^2

You can also use HTML

<i>italics</i> : italics

<b>bold</b> : bold

This is `code` in text

This is code in text

# Heading 1

Heading 1

## Heading 2

Heading 2

### Heading 3

Heading 3

#### Heading 4

Heading 4

Heading 5
Heading 6

Note that when a # symbol is placed inside a code chunk it acts as a normal R comment, but when placed in text it controls the header size.

Lists
Unordered List

- Item

+ Sub-item

- Item

- Item

  • Item

  • Sub-item

  • Item

  • Item

* Unordered List item

*Unordered List item

\1. Ordered List Item

  1. Ordered List Item


Paragraphs that start with > are converted to block quotes. It went a lot faster with two people digging.

Including Equations

You can include equations in your text using LaTeX syntax and surrounding it by a pair of double dollar signs $$. This is useful to specify models.

Below is an equation for a simple linear regression

\[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]

Include inline equations by using a pair of single $ signs.

# The sample mean of $y$ is given by $\bar{y}=\sum\limits_{i=1}^{n}\frac{y_i}{n}$

The sample mean of \(y\) is given by \(\bar{y}=\sum\limits_{i=1}^{n}\frac{y_i}{n}\)

5. Code Chunks

Below the YAML header is the space where you will write your code, accompanying explanation and any outputs. Code that is included in your .Rmd document should be enclosed by three backwards apostrophes ``` (grave accents!). These are known as code chunks

```{r}

norm <- rnorm(100, mean=0, sd=1)

```

norm <- rnorm(100, mean=0, sd=1)
norm[1:5]
## [1]  0.63280247  3.12971515 -0.46914854 -0.95642818 -0.09618048

You can quickly insert a code chunk in RStudio using a button in the toolbar.

code chunks

You can run an individual chunk of code at any time by clicking on the small green arrow.

Run individual chunk

The output of the code will appear just beneath the code chunk.

More on Code Chunks

Its important to remember when you are creating an RMarkdown file that if you want to run code that refers to an object, for example

# print(dataframe)

You must include instructions showing that dataframe is, just like a normal R script. For example

A <- c("a","b","c","d")
B <- c(5,15,25,30)

dataframe = data.frame(A,B)
print(dataframe)
##   A  B
## 1 a  5
## 2 b 15
## 3 c 25
## 4 d 30

Or if you loading a dataframe from a .csv file, you must include the code in the .Rmd

# dataframe <- read.csv('datasets/my_data.csv')

Similarly, if you are using any packages in your analysis, you will have to load them in the .Rmd file using library() as in a normal R script.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Hiding Code Chunks

If you don’t want the code of a particular code chunk to appear in the final document, but still want to show the output (e.g a plot), then you can include echo=FALSE in the code chunk instructions.

```{r, echo=FALSE}

A <- c(“a”,“b”,“c”,“d”) B <- c(5,15,25,30)

dataframe = data.frame(A,B) print(dataframe)

```

Similarly, you might want to create an object, but not include both the code and the output in the final .html file. To do this you can use, include = FALSE

```{r, include=FALSE}

richness <- div %>% group_by(taxonGroup) %>% summarise(Species_richness = n_distinct(taxonName))

```

In some cases, when you load packages into RStudio, various warning messages such as the ones displayed above when loading dplyr If you don’t want these warning messages to appear you can use warning = FALSE

```{r, warning=FALSE}

library(tidyverse)

```

Code Chunk Options

Code chunks accept optional arguments

```{r name, eval=TRUE, warning=TRUE, message=TRUE}

code goes here

```

  • name - This is not necessary, but it is good practice to label your code chunks. Two code chunks cannot have the same name

  • echo - Whether to display the code chunk or just show the results. If you want the code embedded in your document but don’t want the reader of the document to see it, you can set echo=FALSE

  • eval - Whether to run the code in the code chunk. This can be used if you want to display the code but not to have it run.

  • warning - Whether to display warning messages in the document

  • message - Whether to display code messages in the document

  • results - Whether and how to display the computation of the results.

Note: The default for echo, eval, warning and message is TRUE

Code Chunk Figure Options

There is a whole set of optional arguments just for displaying figures

```{r name, fig.height=6, fig.width=4, dpi=300, fig.align=‘center’}

code goes here

```

  • fig.height, fig.width - Specify the height and width of the figure to make it fit into the space you have available.

  • dpi - Specifies the pixels per inch. This effectively controls the size of the object (text, lines, etc) in your figure.

  • fig.align - Specify whether your figure appears right, left, or center aligned.

Inserting Figures

Inserting a graph into RMarkdown is easy, the more energy-demanding aspect might be adjusting the formatting.

By default, RMarkdown will place graphs by maximising their height, while keeping them within the margins of the page and maintaining aspect ratio. If you have a particularly tall figure, this can mean a really huge graph. In the following example we modify the dimensions of the figure we created above. To manually set the figure dimensions, you can insert an instruction into the curly braces.

```{r, fig.width=4, fig.height=3}

print(dataframe) boxplot(B~A, data=dataframe)

```

A <- c('a','a','b','b')
B <- c(5,10,15,20)
dataframe <- data.frame(A,B)
print(dataframe)
##   A  B
## 1 a  5
## 2 a 10
## 3 b 15
## 4 b 20
boxplot(B~A, data=dataframe)

Inserting Tables

Standard R Markdown

While R Markdown can print the contents of a data frame easily by enclosing the name of the data frame in a code chunk:

print(dataframe)
##   A  B
## 1 a  5
## 2 a 10
## 3 b 15
## 4 b 20

This can look a bit messy, especially with data frames with a lot of columns. Including a formal table requires more effort.

kable() function from knitr package

The most aesthetically pleasing and simple table formatting function I have found is kable() in the knitr package. The first argument tells kable to make a table out of the object dataframe and that numbers should have two significant figures. Remember to load the knitr package in your .Rmd file as well

library(knitr)
kable(dataframe, digits=2)
A B
a 5
a 10
b 15
b 20

pander function from pander package

If you want a bit more control over the content of your table you can use pander() in the pander package. Imagine I want the 3rd column to appear in italics:

library(pander)
plant <- c("a","b","c")
temp <- c(20,20,20)
growth <- c(0.65,0.95,0.15)
dataframe <- data.frame(plant, temp, growth)
emphasize.italics.cols(3) # make the 3rd column italics
pander(dataframe) # create the table
plant temp growth
a 20 0.65
b 20 0.95
c 20 0.15

Find more info on pander here

Manually creating tables using markdown

You can also manually create small tables using markdown syntax. This should be put outside of any code chunks.

For example table in markdown will produce

Plant Temp. Growth
A 20 0.65
B 20 0.95
C 20 0.15

The :-----: tells markdown that the line above should be treated as a header and the lines below should be treated as the body of the table. Text alignment of the columns is set y the position of:`:

Syntax Alignment
:----: Centre
:---- Left
----: Right
----- Auto

Creating tables from model outputs

using tidy() from the package broom, we are able to create tables of our model outputs, and insert these tables into our markdown file. The example below shows a simple example linear model, where the summary output table can be saved as a new R object and then added into the markdown file.

library(broom)
library(pander)
A <- c(20,15,10)
B <- c(1,2,3)

lm_test <- lm(A~B) # Creating a linear model

table_obj <- tidy(lm_test) # using tidy() to create a new R object called table

pander(table_obj, digits=3) # using pander() to view the created table, with 3 sig figs
term estimate std.error statistic p.value
(Intercept) 25 4.07e-15 6.14e+15 1.04e-16
B -5 1.88e-15 -2.65e+15 2.4e-16

By using warning=FALSE as an argument, any warnings produced will be outputted in the console when knitting but will not appear in the produced document.

Visualizations

Base R

A picture often says more than words (or lines of code output) and R has a rich visualization toolbox that allows us to make powerful visualizations of our data.

Let’s generate some random numbers and visualize them:

draws <- rnorm(100) # standard normal (mean=0, sd=1)
hist(draws)

draws2 <- rnorm(100, mean=5, sd=2)
hist(draws2)

ggplot2

The base R visualization framework can be somewhat challenging to work with. A good alternative is the ggplot2 package, part of the larger tidyverse which includes more useful packages for data manipulation and analysis. ggplot2 uses a visualization framework based on the grammar of graphics philosophy.

ggplot2 works best with a dataframe as input. As an example we will use the mtcars data, which is available by default in every R installation.

library(ggplot2)

ggplot(mpg, aes(displ, hwy, colour=class)) + geom_point()

Let’s create a new plot using code chunk figure options we saw earlier

gg <- ggplot(mtcars, aes(hp,mpg)) + 
      geom_point(aes(color=as.factor(cyl)), size=5) +
      geom_smooth(method="lm", se=FALSE) +
      labs(x="Horsepower", y="Miles per Gallon (mpg)", color="# of Cylinders") +
      theme_bw()

gg
## `geom_smooth()` using formula 'y ~ x'

7. Creating .pdf files in RMarkdown

Creating .pdf documents for printing in A4 requires a bit more fiddling around. RStudio uses the LaTeX compiling system to make .pdf documents.

The easiest way to use LaTeX is to install the TinyTex distribution from within RStudio.

Restart your R session (Session -> Restart R), then run these line in the console:

install.packages("tinytex")
tinytex::install_tinytex()

Becoming familiar with LaTeX will give you a lot more options to make your R markdown .pdf look pretty, as LaTeX commands are mostly compatible with R Markdown, though some googling is often required.

To compile a .pdf instead of a .html document, change output: from html_document to pdf_document, or use the dropdown menu from the “Knit” button.

Knit to PDF

Bonus task!

Convert one of the R scripts from the previous lessons, into a well commented and easy to follow R Markdown document.