R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Since then, endless efforts have been made to improve R’s user interface. The journey of R language from a rudimentary text editor to interactive R Studio and more recently Jupyter Notebooks has engaged many data science communities across the world.
This was possible only because of generous contributions by R users globally. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization and computation much faster.
But, what about Machine Learning ?
My first impression of R was that it’s just a software for statistical computing. Good thing, I was wrong! R has enough provisions to implement machine learning algorithms in a fast and simple manner.
This is a complete tutorial to learn data science and machine learning using R. By the end of this tutorial, you will have a good exposure to building predictive models using machine learning on your own.
Note: No prior knowledge of data science / analytics is required. However, prior knowledge of algebra and statistics will be helpful.
Note: The data set used in this article is from Big Mart Sales Prediction .
1. Basics of R Programming
Why learn R ?
I don’t know if I have a solid reason to convince you, but let me share what got me started. I have no prior coding experience. Actually, I never had computer science in my subjects. I came to know that to learn data science, one must learn either R or Python as a starter. I chose the former. Here are some benefits I found after using R:
The style of coding is quite easy.
It’s open source. No need to pay any subscription charges.
Availability of instant access to over 7800 packages customized for various computation tasks.
The community support is overwhelming. There are numerous forums to help you out.
Get high performance computing experience ( require packages)
One of highly sought skill by analytics and data science companies.
There are many more benefits. But, these are the ones which have kept me going. If you think they are exciting, stick around and move to next section. And, if you aren’t convinced, you may like Complete Python Tutorial from Scratch .
How to install R / R Studio ?
You could download and install the old version of R. But, I’d insist you to start with RStudio. It provides much better coding experience. For Windows users, R Studio is available for Windows Vista and above versions. Follow the steps below for installing R Studio:
Go to https://www.rstudio.com/products/rstudio/download/
In ‘Installers for Supported Platforms’ section, choose and click the R Studio installer based on your operating system. The download should begin as soon as you click.
To Start R Studio, click on its desktop icon or use ‘search windows’ to access the program. It looks like this:
Let’s quickly understand the interface of R Studio:
R Console: This area shows the output of code you run. Also, you can directly write codes in console. Code entered directly in R console cannot be traced later. This is where R script comes to use.
R Script: As the name suggest, here you get space to write codes. To run those codes, simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click on little ‘Run’ button location at top right corner of R Script.
R environment: This space displays the set of external elements added. This includes data set, variables, vectors, functions etc. To check if data has been loaded properly in R, always look at this area.
Graphical Output: This space display the graphs created during exploratory data analysis.
Data Innovation Summit 2017
30% off with code 7wData
Big Data Innovation Summit London
$200 off with code DATA200
Enterprise Data World 2017
$200 off with code 7WDATA
Data Visualisation Summit San Francisco
$200 off with code DATA200
Chief Analytics Officer Europe
15% off with code 7WDCAO17