R Makdown allows you to create documents that serve as a neat record of your analysis. In the world of reproducible research, we want other researchers to easily understand what we did in our analysis, otherwise nobody can be certain that you analysed your data properly. You might choose to create an RMarkdown document as an appendix to a paper or project assignment that you are doing, upload it to an online repository such as Github, or simply to keep as a personal record so you can quickly look back at your code and see what you did. RMarkdown presents your code alongside its output (graphs, tables, etc.) with conventional text to explain it, a bit like a notebook.
R markdown makes use of Markdown syntax. Markdown is a very simple ‘markup’ language which provides methods for creating documents with headers, images, links etc. from plain text files, while keeping the original plain text file easy to read. You can convert Markdown documents to many other file types like .html or .pdf to display the headers, images etc.
When you use an RMarkdown file (.Rmd), you can use conventional Markdown syntax alongside chunks of code written in R (or other programming languages!). When you knit the RMarkdown file, the Markdown formating and the R code are evaluated, and an ouput file (HTML, PDF, etc) is produced.
To get RMarkdown working in RStudio, the first thing you need is the rmarkdown package, which you can get from CRAN by running the following commands in R or RStudio:
#install.packages('rmarkdown')
Then load the package into your project
library(rmarkdown)
To create a new RMarkdown file (.Rmd), select File -> New File -> R Markdown in RStudio, then choose the file type you want to create. For now we will focus on a .html Document, which can be easily converted to other types later.
The newly created .Rmd file comes with basic instructions, but we want to create our own RMarkdown script, so go ahead and delete everything in the example file.
At the top of any RMarkdown script is a YAML header section enclosed by —. By default this includes a title, author, date and the file type you want to output to. Rules in the header section will alter the whole document. Have a flick through quickly to familiarise yourself with the sort of things you can alter by adding an option to the YAML header.
Through YAML you can set: - font (size and style) - default figure options (height, width, etc) - reference custom CSS (Cascading Style Sheets) code
By default, the title, author, date and output format are printed at the top of your .html document. This is the minimum you should put in your header section.
Now that we have our first piece of content, we can test the .Rmd file by compiling it to .html. To compile your .Rmd file into a .html document, you should press the Knit button in the taskbar.
By default, RStudio opens a separate preview window to display the output of your .Rmd file. If you want the output to be displayed in the Viewer window in RStudio (the same window where you would see plotted figures/ packages/ file paths), select “View in Pane” from the drop down menu that appears when you click on the Knit button in the taskbar, or in the Settings gear icon drop down menu next to the Knit button.
A preview appears, and a .html file is also saved to the same folder where you saved your .Rmd file.
You can use regular markdown rules in your R Markdown document. Once you knit your document, the output will display text formatted according to the following simple rules.
Here are a few common formatting commands:
*Italic* or _italics_ : Italic
**Bold** or __ Bold __: Bold
~~strikethrough~~ : strikethrough
superscript\^2
: superscript^2
subscript\^\~2\~
: subscript^2
You can also use HTML
<i>italics</i>
: italics
<b>bold</b>
: bold
This is `code` in text
This is code
in text
# Heading 1
## Heading 2
### Heading 3
#### Heading 4
Note that when a #
symbol is placed inside a code chunk it acts as a normal R comment, but when placed in text it controls the header size.
- Item
+ Sub-item
- Item
- Item
Item
Sub-item
Item
Item
* Unordered List item
*Unordered List item
\1. Ordered List Item
Paragraphs that start with
>
are converted to block quotes. It went a lot faster with two people digging.
# how to include links
# [Display Text](Link Url)
# [Google](https://www.google.com)
# latex style
# $A = \pi \times r%{2}$
\(A = \pi \times r^{2}\)
The $
symbols tells R Markdown to use Latex equation syntax Don’t include a space after the $
of before the closing $
You can include equations in your text using LaTeX
syntax and surrounding it by a pair of double dollar signs $$
. This is useful to specify models.
Below is an equation for a simple linear regression
\[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]
Include inline equations by using a pair of single $
signs.
# The sample mean of $y$ is given by $\bar{y}=\sum\limits_{i=1}^{n}\frac{y_i}{n}$
The sample mean of \(y\) is given by \(\bar{y}=\sum\limits_{i=1}^{n}\frac{y_i}{n}\)
Below the YAML header is the space where you will write your code, accompanying explanation and any outputs. Code that is included in your .Rmd
document should be enclosed by three backwards apostrophes ``` (grave accents!). These are known as code chunks
```{r}
norm <- rnorm(100, mean=0, sd=1)
```
norm <- rnorm(100, mean=0, sd=1)
norm[1:5]
## [1] 0.63280247 3.12971515 -0.46914854 -0.95642818 -0.09618048
You can quickly insert a code chunk in RStudio using a button in the toolbar.
You can run an individual chunk of code at any time by clicking on the small green arrow.
The output of the code will appear just beneath the code chunk.
Its important to remember when you are creating an RMarkdown file that if you want to run code that refers to an object, for example
# print(dataframe)
You must include instructions showing that dataframe
is, just like a normal R script. For example
A <- c("a","b","c","d")
B <- c(5,15,25,30)
dataframe = data.frame(A,B)
print(dataframe)
## A B
## 1 a 5
## 2 b 15
## 3 c 25
## 4 d 30
Or if you loading a dataframe from a .csv
file, you must include the code in the .Rmd
# dataframe <- read.csv('datasets/my_data.csv')
Similarly, if you are using any packages in your analysis, you will have to load them in the .Rmd file using library()
as in a normal R script.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
If you don’t want the code of a particular code chunk to appear in the final document, but still want to show the output (e.g a plot), then you can include echo=FALSE
in the code chunk instructions.
```{r, echo=FALSE}
A <- c(“a”,“b”,“c”,“d”) B <- c(5,15,25,30)
dataframe = data.frame(A,B) print(dataframe)
```
Similarly, you might want to create an object, but not include both the code and the output in the final .html file. To do this you can use, include = FALSE
```{r, include=FALSE}
richness <- div %>% group_by(taxonGroup) %>% summarise(Species_richness = n_distinct(taxonName))
```
In some cases, when you load packages into RStudio, various warning messages such as the ones displayed above when loading dplyr If you don’t want these warning messages to appear you can use warning = FALSE
```{r, warning=FALSE}
library(tidyverse)
```
Code chunks accept optional arguments
```{r name, eval=TRUE, warning=TRUE, message=TRUE}
code goes here
```
name - This is not necessary, but it is good practice to label your code chunks. Two code chunks cannot have the same name
echo - Whether to display the code chunk or just show the results. If you want the code embedded in your document but don’t want the reader of the document to see it, you can set echo=FALSE
eval - Whether to run the code in the code chunk. This can be used if you want to display the code but not to have it run.
warning - Whether to display warning messages in the document
message - Whether to display code messages in the document
results - Whether and how to display the computation of the results.
Note: The default for echo, eval, warning and message is TRUE
There is a whole set of optional arguments just for displaying figures
```{r name, fig.height=6, fig.width=4, dpi=300, fig.align=‘center’}
code goes here
```
fig.height, fig.width - Specify the height and width of the figure to make it fit into the space you have available.
dpi - Specifies the pixels per inch. This effectively controls the size of the object (text, lines, etc) in your figure.
fig.align - Specify whether your figure appears right, left, or center aligned.
Inserting a graph into RMarkdown is easy, the more energy-demanding aspect might be adjusting the formatting.
By default, RMarkdown will place graphs by maximising their height, while keeping them within the margins of the page and maintaining aspect ratio. If you have a particularly tall figure, this can mean a really huge graph. In the following example we modify the dimensions of the figure we created above. To manually set the figure dimensions, you can insert an instruction into the curly braces.
```{r, fig.width=4, fig.height=3}
print(dataframe) boxplot(B~A, data=dataframe)
```
A <- c('a','a','b','b')
B <- c(5,10,15,20)
dataframe <- data.frame(A,B)
print(dataframe)
## A B
## 1 a 5
## 2 a 10
## 3 b 15
## 4 b 20
boxplot(B~A, data=dataframe)
While R Markdown can print the contents of a data frame easily by enclosing the name of the data frame in a code chunk:
print(dataframe)
## A B
## 1 a 5
## 2 a 10
## 3 b 15
## 4 b 20
This can look a bit messy, especially with data frames with a lot of columns. Including a formal table requires more effort.
The most aesthetically pleasing and simple table formatting function I have found is kable()
in the knitr package. The first argument tells kable to make a table out of the object dataframe
and that numbers should have two significant figures. Remember to load the knitr
package in your .Rmd file as well
library(knitr)
kable(dataframe, digits=2)
A | B |
---|---|
a | 5 |
a | 10 |
b | 15 |
b | 20 |
If you want a bit more control over the content of your table you can use pander()
in the pander
package. Imagine I want the 3rd column to appear in italics:
library(pander)
plant <- c("a","b","c")
temp <- c(20,20,20)
growth <- c(0.65,0.95,0.15)
dataframe <- data.frame(plant, temp, growth)
emphasize.italics.cols(3) # make the 3rd column italics
pander(dataframe) # create the table
plant | temp | growth |
---|---|---|
a | 20 | 0.65 |
b | 20 | 0.95 |
c | 20 | 0.15 |
Find more info on pander here
You can also manually create small tables using markdown syntax. This should be put outside of any code chunks.
For example will produce
Plant | Temp. | Growth |
---|---|---|
A | 20 | 0.65 |
B | 20 | 0.95 |
C | 20 | 0.15 |
The :-----: tells markdown that the line above should be treated as a header and the lines below should be treated as the body of the table. Text alignment of the columns is set y the position of
:`:
Syntax | Alignment |
---|---|
:----: |
Centre |
:---- |
Left |
----: |
Right |
----- |
Auto |
using tidy()
from the package broom
, we are able to create tables of our model outputs, and insert these tables into our markdown file. The example below shows a simple example linear model, where the summary output table can be saved as a new R object and then added into the markdown file.
library(broom)
library(pander)
A <- c(20,15,10)
B <- c(1,2,3)
lm_test <- lm(A~B) # Creating a linear model
table_obj <- tidy(lm_test) # using tidy() to create a new R object called table
pander(table_obj, digits=3) # using pander() to view the created table, with 3 sig figs
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 25 | 4.07e-15 | 6.14e+15 | 1.04e-16 |
B | -5 | 1.88e-15 | -2.65e+15 | 2.4e-16 |
By using warning=FALSE
as an argument, any warnings produced will be outputted in the console when knitting but will not appear in the produced document.
A picture often says more than words (or lines of code output) and R has a rich visualization toolbox that allows us to make powerful visualizations of our data.
Let’s generate some random numbers and visualize them:
draws <- rnorm(100) # standard normal (mean=0, sd=1)
hist(draws)
draws2 <- rnorm(100, mean=5, sd=2)
hist(draws2)
The base R visualization framework can be somewhat challenging to work with. A good alternative is the ggplot2
package, part of the larger tidyverse
which includes more useful packages for data manipulation and analysis. ggplot2
uses a visualization framework based on the grammar of graphics philosophy.
ggplot2
works best with a dataframe as input. As an example we will use the mtcars
data, which is available by default in every R installation.
library(ggplot2)
ggplot(mpg, aes(displ, hwy, colour=class)) + geom_point()
Let’s create a new plot using code chunk figure options we saw earlier
gg <- ggplot(mtcars, aes(hp,mpg)) +
geom_point(aes(color=as.factor(cyl)), size=5) +
geom_smooth(method="lm", se=FALSE) +
labs(x="Horsepower", y="Miles per Gallon (mpg)", color="# of Cylinders") +
theme_bw()
gg
## `geom_smooth()` using formula 'y ~ x'
.pdf
files in RMarkdownCreating .pdf
documents for printing in A4 requires a bit more fiddling around. RStudio uses the LaTeX compiling system to make .pdf documents.
The easiest way to use LaTeX is to install the TinyTex distribution from within RStudio.
Restart your R session (Session -> Restart R), then run these line in the console:
install.packages("tinytex")
tinytex::install_tinytex()
Becoming familiar with LaTeX will give you a lot more options to make your R markdown .pdf look pretty, as LaTeX commands are mostly compatible with R Markdown, though some googling is often required.
To compile a .pdf instead of a .html document, change output:
from html_document
to pdf_document
, or use the dropdown menu from the “Knit” button.
Convert one of the R scripts from the previous lessons, into a well commented and easy to follow R Markdown document.