Skip to content

Latest commit

 

History

History
176 lines (142 loc) · 5.85 KB

r-101.md

File metadata and controls

176 lines (142 loc) · 5.85 KB

R 101

R is a programming language designed to perform statistics, data visualization, and machine learning. R was created in 1993. It is an open source project and is free to download and use without any restrictions.

R is used in many different fields including:

  • Data science
  • Information science
  • Bioinformatics
  • Business analytics
  • and more!

RStudio

RStudio is a sophisticated code editor for the R programming language. View the [quick start]({{ site.baseurl }}/quickstart.html) guide to learn how to install R and RStudio.

By default, RStudio has a four panel layout as depicted below.

Note. Image courtesy of the Carpentries at Edinburgh ![]({{ site.baseurl }}/img/rstudio-layout.png)

  • Source: this panel is where you write your R scripts. Any code that you want to save for later goes here. Think about it like a Word document for your R code.
  • Console/Terminal: this panel is where you can execute R code that you don't need to save for later. This is useful for installing packages or other one time activates. You can type code in the console and hit enter to execute your command.
  • Environment/History: this panel shows you what variables and other objects you have loaded into memory. More simply, it tells you what data you're currently working with.
  • Files/Plots/Packages/Help: this panel contains a lot of other information including what packages you have installed, any plots you have created, and a help menu to read documentation.

This is the default layout of RStudio. You can change it in the Global Settings to arrange the panels in anyway you'd like.

Basic Syntax

R has a simple and easy to learn syntax. It was designed to be accessible to non-programmers and researchers. First we will start with some basic math functions.

# Anything that begins with a hashtag is a comment
# This allows you to add notes to your program 

1 + 1    # Addition
2 - 1    # Subtraction
5 * 1    # Multipliation
25 / 5   # Division
5^2      # Exponents

If you want to save the output of an operation for later use, you can store it in a variable. The assignment operator in R is <-. You can name variables whatever you like, but there are a few rules.

  • Variables cannot contain spaces
  • Variables cannot start with numbers or only contain numbers
  • Variables cannot contain special characters ($,^,&, etc.)
Bad Good
my data my_data
25model model25
group1&group2 group1_and_group2

Variable names should be short but descriptive. Other people should be able to read your code and understand what's going on.

# We want to store '5+5' for later use
result<-5+5

# We can also store other datatypes like a string
full_name<-"Jane Doe"

# We can store lists of things
class_members<-c("John Doe","Jane Doe","Jim Smith")

Since R is a statistical programming language, working with large lists is common. R calls lists "vectors." A vector is a mathematical term, but understanding why R calls them vectors is not important right now. Suppose we have a list of first names. We can put them in a vector.

names<-c("Joe","Jane","Jim")

Notice that our vector starts with a 'c'. This tells R that our vector can contain any data type and can be any length. Each item in a vector must be separated by a comma.

Each item in a vector is given an index. In R, this index always starts at 1. In the example of names it would look something like this.

1 2 3
Joe Jane Jim

We can access individual items in names using the index.

name[1] # Prints "Joe"
name[2] # Prints "Jim"

While most programming languages allow for these basic data types, R makes it much easier to store tabular data. In R, tables are called "data frames." Suppose we had the following table:

Student Grade
Jane Doe 92.0
John Smith 88.1
Max Mustermann 82.0

We can represent this in R using a data frame.

student_grades<-data.frame (
    Student= c("Jane Doe", "John Smith", "Max Mustermann"),
    Grade= c(92.0,88.1,82.0)
)

We can access individual columns as vectors using the $ operator.

student_grades$Grade       # Gets a vector of all grades
student_grades$Grade[1]    # Gets the first grade
mean(student_grades$Grade) # Takes the average of all grades

Installing and Loading Packages

Over the years, the R community has created many additional features. These features are bundled together in packages. Packages can be installed easily using the install.packages function.

Suppose we wanted to install ggplot2 (a package that allows you to build nice looking graphs). In the R console, type out the following command and press enter.

install.packages("ggplot2")

This will download the package from the internet and install the it on your computer. Then, whenever you want to use it in an R script, you can you can load it into your environment using the library function. Suppose we wanted to load ggplot2. We could write the following command.

library(ggplot2)

Keyboard Shortcuts

RStudio has a handful of keyboard shortcuts that speed up programming. Here are some of the most common.

Windows/Linux Mac Command
Ctrl+Enter Cmd+Return Executes current line
Ctrl+Shift+m Cmd+Shift+m %>%
Alt+Minus Alt+Minus <-
Ctrl+Shift+Plus Cmd+Shift+Plus Zoom in
Ctrl+Shift+Minus Cmd+Shift+Minus Zoom out

Resources