A program in R is made up of three things: Variables, Comments, and Keywords. Variables are used to store the data, Comments are used to improve code readability, and Keywords are reserved words that hold a specific meaning to the compiler.
Environment can be thought of as a collection of objects (functions, variables etc.). An environment is created when we first fire up the R interpreter. Any variable we define, is now in this environment.
We can use the ls()
function to show what variables and functions are defined in the current environment. Moreover, we can use the environment()
function to get the current environment.
When using R, the first step suggested is to set a working directory. A working directory is a folder that we visit frequently to read and save files and data when we are working on a project.
We can use the function getwd()
to return the location of the current working directory. Note that getwd()
is a function without arguments.
You can check your current working directory by running the command getwd()
in the console.
getwd()
## [1] "C:/Users/Just Nick/Desktop/Analysis/Denaco/R"
If we feel the current location isn’t the right place, we can set the working directory using the setwd()
function.
# set working directory to parent folder
setwd("../")
In RStudio, there’s an easier way to change the working directory. From the menu bar go to Session; then move to Set Working Directory; and select Choose directory …. What we did from the drop-down menu will be represented in code in the console.
dir()
lists all the files in the current working directory.
dir()
If you’re working on several projects, it is recommended that you use several working directories for different projects. Before you start, or before you save and write any files and data, do remember to check if the location is correct for you or not.
Variables are containers for storing data values.
In R, variables are created by assigning the value directly to an identifier. Valid variable names may consist of letters, numbers and special characters namely dot or underscore. Note that, a dot should not be followed by a number.
<-
In R, we can use both =
and <-
as assignment operators.
However, <-
is preferred in most cases because the = operator can be forbidden in some context in R.
name <- "John Doe"
name
## [1] "John Doe"
A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume).
Rules for R variables are:
A variable name must start with a letter and can be a combination of letters, digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.
A variable name cannot start with a number or underscore (_)
Variable names are case-sensitive (age, Age and AGE are three different variables)
Reserved words cannot be used as variables (TRUE, FALSE, NULL, if…)
Variables can store data of different types, and different types can do different things.
The data type of an object can be found using class()
function. Basic classes in R are:
boolean
(TRUE
,FALSE
)# numeric
x <- 10.5
class(x)
## [1] "numeric"
# integer
x <- 1000L
class(x)
## [1] "integer"
# complex
x <- 9i + 3
class(x)
## [1] "complex"
# character/string
x <- "R is exciting"
class(x)
## [1] "character"
# logical/boolean
x <- TRUE
class(x)
## [1] "logical"
A character, or strings, are used for storing text. A string is surrounded by either single quotation marks, or double quotation marks.
string1 <- "This is a string"
string2 <- "You can include a single 'quote' string inside a double quote string"
string1
## [1] "This is a string"
string2
## [1] "You can include a single 'quote' string inside a double quote string"
Escape characters
string2 <- "You can include a double \"quote\" string inside a double quote string. Use the escape character \\"
string2
## [1] "You can include a double \"quote\" string inside a double quote string. Use the escape character \\"
Use paste()
function
f_name <- "John"
s_name <- "Doe"
paste(f_name,s_name)
## [1] "John Doe"
A vector is simply a list of items that are of the same type.
To combine the list of items to a vector, use the c()
function and separate the items by a comma.
x <- c(1,2,3,4)
c()
concatenateThe c()
method is used to create vectors combining different values together. We can even combine objects of different data types, then the data type of vector becomes the highest data type of it’s elements.
x <- c(1,2,3,4)
print(x)
## [1] 1 2 3 4
print(class(x))
## [1] "numeric"
:
x <- 1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
seq()
functionx <- seq(1,6)
x
## [1] 1 2 3 4 5 6
x <- seq(1,10,by=2)
x
## [1] 1 3 5 7 9
A list in R can contain many different data types inside it. A list is a collection of data which is ordered and changeable.
Lists are flexible and all-in-one kind of objects. They can store objects of different types. They can have matrices, numeric, vectors and even other lists with in them.
To create a list, use the list()
function
my_list = list("banana","mango","orange")
my_list
## [[1]]
## [1] "banana"
##
## [[2]]
## [1] "mango"
##
## [[3]]
## [1] "orange"
A matrix is a two dimensional data set with columns and rows.
A column is a vertical representation of data, while a row is a horizontal representation of data.
A matrix can be created with the matrix()
function. Specify the nrow and ncol parameters to get the amount of rows and column
my_matrix = matrix(c(1:16),nrow=4,ncol=4)
my_matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
Factors are used to categorize data. To create a factor, use the factor()
function and add a vector as argument
gender <- factor(c("male","female","male","female"))
gender
## [1] male female male female
## Levels: female male
levels(gender)
## [1] "female" "male"
To only print the levels, use the levels()
function
Data Frames are data displayed in a format as a table.
Data Frames can have different types of data inside it. While the first column can be character, the second and third can be numeric or logical. However, each column should have the same type of data.
Use the data.frame()
function to create a data frame
countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
capitals = c("Beijing","New Delhi","Brasil","Washington DC","Addis Ababa","Cairo")
density = c(153,464,25,36,115,103)
my_df <- data.frame(countries=countries,
capitals=capitals,
density=density)
my_df
## countries capitals density
## 1 China Beijing 153
## 2 India New Delhi 464
## 3 Brazil Brasil 25
## 4 USA Washington DC 36
## 5 Ethiopia Addis Ababa 115
## 6 Egypt Cairo 103
Accessing a Specific column of your data frame
my_df$countries
## [1] "China" "India" "Brazil" "USA" "Ethiopia" "Egypt"
There are many different ways we can subset any kind of object, and three different subsetting operators for the different data structures.
Subsetting a vector always returns another vector.
You can access the vector items by referring to its index number inside brackets []
. The first item has index 1, the second item has index 2.
countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
countries[4]
## [1] "USA"
You can also access multiple elements by referring to different index positions with the c()
function
countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
# access both USA and Ethiopia
countries[c(4,5)]
## [1] "USA" "Ethiopia"
If we use a negative number as the index of a vector, R will return every element except for the one specified.
countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
# access all countries except USA and Ethiopia
countries[-c(4,5)]
## [1] "China" "India" "Brazil" "Egypt"
To change the value of a specific item, refer to the index number
countries[3] = "Brasil"
countries
## [1] "China" "India" "Brasil" "USA" "Ethiopia" "Egypt"
Sample Matrix
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
Indexing matrices with []
takes two arguments: the first expression is applied to the rows, the second to the columns:
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix[c(3,4),c(3,4)]
## [,1] [,2]
## [1,] 11 12
## [2,] 15 16
To retrieve all the rows:
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
my_matrix[,c(3,4)]
## [,1] [,2]
## [1,] 3 4
## [2,] 7 8
## [3,] 11 12
## [4,] 15 16
To retrieve all columns for specified rows
my_matrix = matrix(c(1:16),nrow=4,ncol=4,byrow=TRUE)
# all columns for row 1 and 2
my_matrix[1:2,]
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
we can use element indices and []
to subset lists Using []
will always return a list.
my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))
my_list[1]
## $fruits
## [1] "Banana" "Orange" "Mango" "Pineapple"
To extract individual elements of a list, we use the double-square bracket function: [[]]
my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))
my_list[[1]]
## [1] "Banana" "Orange" "Mango" "Pineapple"
$
operatorThe $ operator is a shorthand way for extracting single elements by name:
my_list = list(fruits=c("Banana","Orange","Mango","Pineapple"),veges=c("avocado","cabbage","tomato"))
my_list$veges
## [1] "avocado" "cabbage" "tomato"
thislist <- list("apple", "banana", "cherry")
"apple" %in% thislist
## [1] TRUE
We can use single brackets [ ]
, double brackets [[ ]]
or $
to access columns from a data frame
Using the []
operator with one argument will act the same way as for lists, where each list element corresponds to a column. The resulting object will be a data.frame
countries = c("China","India","Brazil","USA","Ethiopia","Egypt")
capitals = c("Beijing","New Delhi","Brasil","Washington DC","Addis Ababa","Cairo")
density = c(153,464,25,36,115,103)
my_df <- data.frame(countries=countries,
capitals=capitals,
density=density)
my_df[1]
## countries
## 1 China
## 2 India
## 3 Brazil
## 4 USA
## 5 Ethiopia
## 6 Egypt
my_df["capitals"]
## capitals
## 1 Beijing
## 2 New Delhi
## 3 Brasil
## 4 Washington DC
## 5 Addis Ababa
## 6 Cairo
With two arguments, []
behaves the same way as for matrices
my_df[1:2,c("countries","capitals")]
## countries capitals
## 1 China Beijing
## 2 India New Delhi
[[]]
will act to extract a single column as a vector
my_df[['capitals']]
## [1] "Beijing" "New Delhi" "Brasil" "Washington DC"
## [5] "Addis Ababa" "Cairo"
$
provides a convenient shorthand to extract columns by name
my_df$capitals
## [1] "Beijing" "New Delhi" "Brasil" "Washington DC"
## [5] "Addis Ababa" "Cairo"
R packages are collections of functions and data sets developed by the community. They increase the power of R by improving existing base R functionalities, or by adding new ones.
CRAN: (Comprehensive R Archive Network) the official repository, it is a network of ftp and web servers maintained by the R community around the world. The R foundation coordinates it, and for a package to be published here, it needs to pass several tests that ensure the package is following CRAN policies. You can find more details here.
Recently, the official repository (CRAN) reached 10,000 packages published
you just need the name of the package and use the command install.packages()
install.packages("wordcloud2")
After successful installation of a package you can load it into your workspace using library()
function.
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.1.3
wordcloud2(demoFreq)
To check what packages are installed on your computer, you can use installed.packages()
# check what packages are installed
installed.packages()
Uninstalling a package is straightforward with the function remove.packages()
, in your case
# remove.packages("package_name")