Natural Language Processing
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library . Natural language processing (NLP) is the interaction between computers and human (natural) language. Tm, BoilerpipeR, RCurl and Word clouds are among the packages used in natural language processing that we use in our project. To start using the packages one downloads each from CRAN website and loads it into R alongside its dependencies. In this study, we hope to use the 4 packages to predict which of the 2 US presidential aspirant candidates: Hillary Clinton and Donald Trump would win. We compared their speeches to 2 past president elects who won the presidency: Barrack Obama and George W. Bush. Our study majored on their speeches on three different categories: Foreign Policy, victory speech when elected as party representatives and speech on Jobs and the Economy. We extracted the common phrases to compare the speeches of Hillary and Trump to Obama and Bush. We then derived the frequency of these key phrases in each of the 12 speeches. As a visual aid, these frequencies were plotted as 4 histograms, 1 for each politician. We also derived the most frequently said words of the 4 politicians in their speeches and illustrated them as world clouds. Our conclusion would be based on the president elect who had the most key phrases as compared to the 2 previous presidents.