This aggregated cheatsheet of our tutorial package Data-projects-with-R-and-GitHub summarizes the individual notes from the provided tutorials and additional information from recommended sources.
git
enables collaborative project development (UZH tutorial)- keeps track of all changes
- maintains a local copy of a project on your computer (offline)
- can synchronize this copy with a remote copy (e.g. online on
GitHub)
- it is able to merge multiple (text) file changes in most cases
- always
pull
the recent version before starting your daily work - when you want to lock and upload your changes you have to
commit
the changed files to your local copy along with a useful comment- you can do multiple commits with independent comments (but all changes still offline)
push
all “committed” changed to the remote copy (e.g. GitHub)- Note: this might fail if the remote version of your project already has newer changes!
- Recommended: update your project’s status via
pull
before doing apush
!
- at some point you will get a merge
conflict:
- i.e.
git
finds a newer remote version of a file changed by you with changes at the same spots - thus,
git
does not know what is the correct change - respective lines are highlighted with arrow blocks
<<<<
,>>>>
- find them
- decide what lines to keep (and remove the others)
- remove the
<<<<
,>>>>
, .. parts - commit and push your manually merged files!
- i.e.
- a
branch
is a “copy” of your project with an independent tracking of changes- they are used to isolate code developments until they are stable/working
- a
pull request
aims at merging your changes back into the project
- important data structures
vector
= all elements are same data typelist
= most general, can contain anything, any size, …data.frame/tibble
= list of vectors (of same lengths)- ==> if tidy, resp. column values (in one row) are associated with one observation
- subsetting can be done using
[]
= reduces the current object to the selected part(s) (same data container)[[]]
= provides a single element (element-specific data container)$
= shortcut for[[]]
with name
- programming of workflows often requires
- iteration of code blocks (
for
orwhile
loops)for( VARIABLE in DATA ) {...}
- each element of
DATA
is one by one stored inVARIABLE
before running{...}
- e.g.
for( d in list( x="haha", y=1:3)) { print(length(d)) }
- conditional code execution (
if/else
statements)if ( CONDITION ) {..T..} else {..F..}
else {..B..}
is optionalCONDITION
must evaluate to a singleTRUE|FALSE
: triggers execution of respective{..T|F..}
block- e.g.
if ( 1:4 == 2 ) {}
not working since check returns four logic values!
- e.g.
- e.g.
if ( version$os == "mingw32" ) { print("MS Windows user?") }
- generalization of code blocks (
function
definition and usage)myFunction <- function (ARGUMENTS) {...}
ARGUMENTS
are optional (first should be the data to work on)- run via
myFunction( .. )
using appropriate values forARGUMENTS
- return via
return()
in{...}
(default: last “printed” value is returned) - to return multiple values, aggregate them in
list()
or vectorc()
- iteration of code blocks (
- redundant code w.r.t. values of a variable ==>
for
loop generalization - redundant code w.r.t. variables or function calls ==>
function
generalization - iteration tricks
- best iterate directly on the elements, if you don’t need the element names or index!
- use
seq_along()
to generate the list of valid indices of a vector or list - use
names()
to access the vector of element names of an object
- when a function calls itself it is called a recursive call or recursion
- function names (without brackets) can be stored in and used from variables/arguments to call the functions
- first think about
INPUT
andOUTPUT
of your function, i.e. what do you need and what do you want to create INPUT
will be all function arguments, i.e. the needed variables of your code- create example variables with example content
- write a piece of code that operates on your example variables doing
what you want, i.e. creating your
OUTPUT
- wrap your working code in a
function(){}
construct/body and add areturn()
statement to provide yourOUTPUT
- test your function with your example content as argument values