R is a programming language designed to perform statistics, data visualization, and machine learning. R was created in 1993. It is an open source project and is free to download and use without any restrictions.
R is used in many different fields including:
- Data science
- Information science
- Bioinformatics
- Business analytics
- and more!
RStudio is a sophisticated code editor for the R programming language. View the [quick start]({{ site.baseurl }}/quickstart.html) guide to learn how to install R and RStudio.
By default, RStudio has a four panel layout as depicted below.
Note. Image courtesy of the Carpentries at Edinburgh ![]({{ site.baseurl }}/img/rstudio-layout.png)
- Source: this panel is where you write your R scripts. Any code that you want to save for later goes here. Think about it like a Word document for your R code.
- Console/Terminal: this panel is where you can execute R code that you don't need to save for later. This is useful for installing packages or other one time activates. You can type code in the console and hit enter to execute your command.
- Environment/History: this panel shows you what variables and other objects you have loaded into memory. More simply, it tells you what data you're currently working with.
- Files/Plots/Packages/Help: this panel contains a lot of other information including what packages you have installed, any plots you have created, and a help menu to read documentation.
This is the default layout of RStudio. You can change it in the Global Settings to arrange the panels in anyway you'd like.
R has a simple and easy to learn syntax. It was designed to be accessible to non-programmers and researchers. First we will start with some basic math functions.
# Anything that begins with a hashtag is a comment
# This allows you to add notes to your program
1 + 1 # Addition
2 - 1 # Subtraction
5 * 1 # Multipliation
25 / 5 # Division
5^2 # Exponents
If you want to save the output of an operation for later use, you can store
it in a variable. The assignment operator in R is <-
. You can name variables
whatever you like, but there are a few rules.
- Variables cannot contain spaces
- Variables cannot start with numbers or only contain numbers
- Variables cannot contain special characters ($,^,&, etc.)
Bad | Good |
---|---|
my data | my_data |
25model | model25 |
group1&group2 | group1_and_group2 |
Variable names should be short but descriptive. Other people should be able to read your code and understand what's going on.
# We want to store '5+5' for later use
result<-5+5
# We can also store other datatypes like a string
full_name<-"Jane Doe"
# We can store lists of things
class_members<-c("John Doe","Jane Doe","Jim Smith")
Since R is a statistical programming language, working with large lists is common. R calls lists "vectors." A vector is a mathematical term, but understanding why R calls them vectors is not important right now. Suppose we have a list of first names. We can put them in a vector.
names<-c("Joe","Jane","Jim")
Notice that our vector starts with a 'c'. This tells R that our vector can contain any data type and can be any length. Each item in a vector must be separated by a comma.
Each item in a vector is given an index. In R, this index always starts at 1.
In the example of names
it would look something like this.
1 | 2 | 3 |
---|---|---|
Joe | Jane | Jim |
We can access individual items in names
using the index.
name[1] # Prints "Joe"
name[2] # Prints "Jim"
While most programming languages allow for these basic data types, R makes it much easier to store tabular data. In R, tables are called "data frames." Suppose we had the following table:
Student | Grade |
---|---|
Jane Doe | 92.0 |
John Smith | 88.1 |
Max Mustermann | 82.0 |
We can represent this in R using a data frame.
student_grades<-data.frame (
Student= c("Jane Doe", "John Smith", "Max Mustermann"),
Grade= c(92.0,88.1,82.0)
)
We can access individual columns as vectors using the $
operator.
student_grades$Grade # Gets a vector of all grades
student_grades$Grade[1] # Gets the first grade
mean(student_grades$Grade) # Takes the average of all grades
Over the years, the R community has created many additional features. These
features are bundled together in packages. Packages can be installed easily
using the install.packages
function.
Suppose we wanted to install ggplot2
(a package that allows you to build
nice looking graphs). In the R console, type out the following command
and press enter.
install.packages("ggplot2")
This will download the package from the internet and install the it on your
computer. Then, whenever you want to use it in an R script, you can you can load
it into your environment using the library
function. Suppose we wanted to load
ggplot2
. We could write the following command.
library(ggplot2)
RStudio has a handful of keyboard shortcuts that speed up programming. Here are some of the most common.
Windows/Linux | Mac | Command |
---|---|---|
Ctrl+Enter | Cmd+Return | Executes current line |
Ctrl+Shift+m | Cmd+Shift+m | %>% |
Alt+Minus | Alt+Minus | <- |
Ctrl+Shift+Plus | Cmd+Shift+Plus | Zoom in |
Ctrl+Shift+Minus | Cmd+Shift+Minus | Zoom out |