R Syntax

A program in R is made up of three things: Variables, Comments, and Keywords. Variables are used to store the data, Comments are used to improve code readability, and Keywords are reserved words that hold a specific meaning to the compiler.

Exploring the R environment

Environment can be thought of as a collection of objects (functions, variables etc.). An environment is created when we first fire up the R interpreter. Any variable we define, is now in this environment.

We can use the ls() function to show what variables and functions are defined in the current environment. Moreover, we can use the environment() function to get the current environment.

Working Directory

When using R, the first step suggested is to set a working directory. A working directory is a folder that we visit frequently to read and save files and data when we are working on a project.

We can use the function getwd() to return the location of the current working directory. Note that getwd() is a function without arguments.

You can check your current working directory by running the command getwd() in the console.

getwd()
## [1] "C:/Users/Just Nick/Desktop/Analysis/Denaco/R"

If we feel the current location isn’t the right place, we can set the working directory using the setwd() function.

# set working directory to parent folder
setwd("../")

In RStudio, there’s an easier way to change the working directory. From the menu bar go to Session; then move to Set Working Directory; and select Choose directory …. What we did from the drop-down menu will be represented in code in the console.

dir() lists all the files in the current working directory.

dir()

If you’re working on several projects, it is recommended that you use several working directories for different projects. Before you start, or before you save and write any files and data, do remember to check if the location is correct for you or not.

Variables

Variables are containers for storing data values.

In R, variables are created by assigning the value directly to an identifier. Valid variable names may consist of letters, numbers and special characters namely dot or underscore. Note that, a dot should not be followed by a number.

The Assignment Operator <-

In R, we can use both = and <- as assignment operators.

However, <- is preferred in most cases because the = operator can be forbidden in some context in R.

name <- "John Doe"
name
## [1] "John Doe"

Variable Names

A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume).

Rules for R variables are:

  • A variable name must start with a letter and can be a combination of letters, digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.

  • A variable name cannot start with a number or underscore (_)

  • Variable names are case-sensitive (age, Age and AGE are three different variables)

  • Reserved words cannot be used as variables (TRUE, FALSE, NULL, if…)

Data Types

Variables can store data of different types, and different types can do different things.

The data type of an object can be found using class() function. Basic classes in R are:

# numeric
x <- 10.5
class(x)
## [1] "numeric"
# integer
x <- 1000L
class(x)
## [1] "integer"
# complex
x <- 9i + 3
class(x)
## [1] "complex"
# character/string
x <- "R is exciting"
class(x)
## [1] "character"
# logical/boolean
x <- TRUE
class(x)
## [1] "logical"

R Strings

A character, or strings, are used for storing text. A string is surrounded by either single quotation marks, or double quotation marks.

string1 <- "This is a string"
string2 <- "You can include a single 'quote' string inside a double quote string"

string1
## [1] "This is a string"
string2
## [1] "You can include a single 'quote' string inside a double quote string"

Escape characters

string2 <- "You can include a double \"quote\" string inside a double quote string. Use the escape character \\"

string2
## [1] "You can include a double \"quote\" string inside a double quote string. Use the escape character \\"

Combining Strings

Use paste() function

f_name <- "John"
s_name <- "Doe"

paste(f_name,s_name)
## [1] "John Doe"

R Data Structures

R Vectors

A vector is simply a list of items that are of the same type.

To combine the list of items to a vector, use the c() function and separate the items by a comma.

x <- c(1,2,3,4)

Creation of vectors in R

  1. Using c() concatenate

The c() method is used to create vectors combining different values together. We can even combine objects of different data types, then the data type of vector becomes the highest data type of it’s elements.

x <- c(1,2,3,4)
print(x)
## [1] 1 2 3 4
print(class(x))
## [1] "numeric"
  1. Vector can also be created using :

x <- 1:10

x
##  [1]  1  2  3  4  5  6  7  8  9 10
  1. Using seq() function

x <- seq(1,6)
x
## [1] 1 2 3 4 5 6
x <- seq(1,10,by=2)
x
## [1] 1 3 5 7 9

Lists in R

A list in R can contain many different data types inside it. A list is a collection of data which is ordered and changeable.

Lists are flexible and all-in-one kind of objects. They can store objects of different types. They can have matrices, numeric, vectors and even other lists with in them.

To create a list, use the list() function

my_list = list("banana","mango","orange")
my_list
## [[1]]
## [1] "banana"
## 
## [[2]]
## [1] "mango"
## 
## [[3]]
## [1] "orange"

Matrices in R

A matrix is a two dimensional data set with columns and rows.

A column is a vertical representation of data, while a row is a horizontal representation of data.

A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to get the amount of rows and column

my_matrix = matrix(c(1:16),nrow=4,ncol=4)
my_matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16

Factors

Factors are used to categorize data. To create a factor, use the factor() function and add a vector as argument

gender <- factor(c("male","female","male","female"))
gender
## [1] male   female male   female
## Levels: female male
levels(gender)
## [1] "female" "male"

To only print the levels, use the levels() function

Data Frames

Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it. While the first column can be character, the second and third can be numeric or logical. However, each column should have the same type of data.

Use the data.frame() function to create a data frame

countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
capitals = c("Beijing","New Delhi","Brasil","Washington DC","Addis Ababa","Cairo")
density = c(153,464,25,36,115,103)

my_df <- data.frame(countries=countries,
           capitals=capitals,
           density=density)

my_df
##   countries      capitals density
## 1     China       Beijing     153
## 2     India     New Delhi     464
## 3    Brazil        Brasil      25
## 4       USA Washington DC      36
## 5  Ethiopia   Addis Ababa     115
## 6     Egypt         Cairo     103

Accessing a Specific column of your data frame

my_df$countries
## [1] "China"    "India"    "Brazil"   "USA"      "Ethiopia" "Egypt"

Indexing and subsetting

There are many different ways we can subset any kind of object, and three different subsetting operators for the different data structures.

Subsetting vectors

Subsetting a vector always returns another vector.

You can access the vector items by referring to its index number inside brackets []. The first item has index 1, the second item has index 2.

countries = c("China","India","Brazil","USA","Ethiopia","Egypt")

countries[4]
## [1] "USA"

You can also access multiple elements by referring to different index positions with the c() function

countries = c("China","India","Brazil","USA","Ethiopia","Egypt")


# access both USA and Ethiopia
countries[c(4,5)]
## [1] "USA"      "Ethiopia"

Excluding and removing elements

If we use a negative number as the index of a vector, R will return every element except for the one specified.

countries = c("China","India","Brazil","USA","Ethiopia","Egypt")


# access all countries except USA and Ethiopia
countries[-c(4,5)]
## [1] "China"  "India"  "Brazil" "Egypt"

Change an Item

To change the value of a specific item, refer to the index number

countries[3] = "Brasil"

countries
## [1] "China"    "India"    "Brasil"   "USA"      "Ethiopia" "Egypt"

Matrix subsetting

Sample Matrix

my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16

Indexing matrices with [] takes two arguments: the first expression is applied to the rows, the second to the columns:

my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)

my_matrix[c(3,4),c(3,4)]
##      [,1] [,2]
## [1,]   11   12
## [2,]   15   16

To retrieve all the rows:

my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)

my_matrix[,c(3,4)]
##      [,1] [,2]
## [1,]    3    4
## [2,]    7    8
## [3,]   11   12
## [4,]   15   16

To retrieve all columns for specified rows

my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)

# all columns for row 1 and 2
my_matrix[1:2,]
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8

Subsetting lists

we can use element indices and [] to subset lists Using [] will always return a list.

my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))
my_list[1]
## $fruits
## [1] "Banana"    "Orange"    "Mango"     "Pineapple"

To extract individual elements of a list, we use the double-square bracket function: [[]]

my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))

my_list[[1]]
## [1] "Banana"    "Orange"    "Mango"     "Pineapple"

The $ operator

The $ operator is a shorthand way for extracting single elements by name:

my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))

my_list$veges
## [1] "avocado" "cabbage" "tomato"

Check if Item Exists

thislist <- list("apple", "banana", "cherry")

"apple" %in% thislist
## [1] TRUE

Subsetting Data Frames

We can use single brackets [ ], double brackets [[ ]] or $ to access columns from a data frame

Using the [] operator with one argument will act the same way as for lists, where each list element corresponds to a column. The resulting object will be a data.frame

countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
capitals = c("Beijing","New Delhi","Brasil","Washington DC","Addis Ababa","Cairo")
density = c(153,464,25,36,115,103)

my_df <- data.frame(countries=countries,
           capitals=capitals,
           density=density)

my_df[1]
##   countries
## 1     China
## 2     India
## 3    Brazil
## 4       USA
## 5  Ethiopia
## 6     Egypt
my_df["capitals"]
##        capitals
## 1       Beijing
## 2     New Delhi
## 3        Brasil
## 4 Washington DC
## 5   Addis Ababa
## 6         Cairo

With two arguments, [] behaves the same way as for matrices

my_df[1:2,c("countries","capitals")]
##   countries  capitals
## 1     China   Beijing
## 2     India New Delhi

[[]] will act to extract a single column as a vector

my_df[['capitals']]
## [1] "Beijing"       "New Delhi"     "Brasil"        "Washington DC"
## [5] "Addis Ababa"   "Cairo"

$ provides a convenient shorthand to extract columns by name

my_df$capitals
## [1] "Beijing"       "New Delhi"     "Brasil"        "Washington DC"
## [5] "Addis Ababa"   "Cairo"

Packages

R packages are collections of functions and data sets developed by the community. They increase the power of R by improving existing base R functionalities, or by adding new ones.

CRAN

CRAN: (Comprehensive R Archive Network) the official repository, it is a network of ftp and web servers maintained by the R community around the world. The R foundation coordinates it, and for a package to be published here, it needs to pass several tests that ensure the package is following CRAN policies. You can find more details here.

Recently, the official repository (CRAN) reached 10,000 packages published

Installing Packages From CRAN

you just need the name of the package and use the command install.packages()

install.packages("wordcloud2")

After successful installation of a package you can load it into your workspace using library() function.

library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.1.3
wordcloud2(demoFreq)

To check what packages are installed on your computer, you can use installed.packages()

# check what packages are installed
installed.packages()

Uninstalling a package is straightforward with the function remove.packages(), in your case

# remove.packages("package_name")

References