You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most common issues with Github and RStudio can be resolved after some googling, but there are times that we are unaware that there is an issue with our current setup. For the purposes of our project, we should at least be familiar and able to access the following list of Github/RStudio functions.
Having an RStudio project created for this repository (aka. repo)
Committing changes to a GitHub repository from RStudio
Pushing to a GitHub repo from RStudio
Pulling from a GitHub repo to RStudio
Merging files after a push or pull on RStudio
Managing repo files on RStudio
Having email notifications for changes made to this repo
Let me know if any of these are unfamiliar to you.
1. Style guidelines
Here is a well written guide by google that accurately reflects the style conventions of the R community. It addresses most of the style "issues" in our script.
Keep lines under 80 characters, with a few exceptions, like our dfr() call.
If other researchers will be poring over our code, it would be helpful to have variable names that are both concise and descriptive. Our variable names are currently concise but also a bit esoteric to any outsider.
Names should use snake_case instead of camelCase, since many of our abstractions can best be described with acronyms. Examples: foo_bar, harderian_gland, calculate_id. Capitalized acronyms shall be used as convention in radiation research literature (e.g. nte_HZE_ider, low_LET_ider).
2. General programming tips:
2.1. Environment Management
To clear the global environment (undefine all variables, functions, data, etc.) you can use:
rm(list=ls())
2.2. Debugging with breakpoints
2.2.1. browser() breakpoints
To better understand why your code may be raising error messages, you may add browser() to a new line above the code you suspect is buggy. When you run your code, the debugger will be raised at the line with browser() and drop you into the current environment1 of the script. This means that everything that has been created or changed by the code up to that line will be available to you to view and call in the console. With browser(), you can easily check or test objects created in function environments. Hint: use str() to check the type of an object.
For example, you can use browser() to check how the value of a variable changes in a loop. Consider the following function:
loop <- function(x) {
for (i in seq(100)) {
browser()
x <- x + 1
}
return(x)
}
Let's say you want to examine the behavior of loop. If you run loop without the browser() call in the third line, you simply get the output of loop. With the browser() call, you can closely examine the environment of loop. When loop() is called, browser() drops you into the debugger when it is evaluated. If we check the value of x, we can see that it is 0.
> loop(0)
Called from: loop(0)
Browse[1]> x
[1] 0
If we let the debugger continue to the next browser() (press the continue button or run c) and check the value of x, then we can see that x is 1 in the second loop.
Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 1
We can continue to run the debugger to see how x changes as loop runs.
Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 2
In this particular example, loop is clearly simple, but with more complicated functions or loops, browser() can shed much more insight.
2.2.2. Editor breakpoints
RStudio allows users to set breakpoints without changing the existing code by clicking directly to the left of a line of code. A red dot should appear. If a red circle appears instead, then the breakpoint is deferred. This can happen for a number of reasons, but saving the file or running source() in the console or with the editor should change the circle to a dot. Editor breakpoints, unlike browser() breakpoints, can only be used with source(). They are generally less versatile than browser().
2.2.3. Debugger console
Running or sourcing code with active breakpoints will halt execution at the first encountered breakpoint. At this point, the console will display several new commands:
Next: Executes the next line of code without leaving the debugger.
Step into (down arrow): If possible, moves into the source code of the frozen line. For example, if execution is halted at foo(x) then the halted point of execution would be moved to the source code of 'foo'.
Step out (right arrow): If possible, moves out out the current code. For example, this would undo a preceding Step into command.
Continue: Run code until the next breakpoint or until all code is executed.
Stop: Exit the debugger
The console also can run most R code, which is useful for checking the values of variables or writing test functions within the debugger.
2.2.4. Additional resources
RStudio documentation for debugging resources can be found here.
2.3. Locating source code
Try getAnywhere(function). However, it's usually more useful to step into source code when using breakpoints and the debugger, especially for complicated functions.
2.4. Reducing runtime
Sometimes we find that we would like our programs to run faster. Here are various methods to locate and rewrite slow code to be more efficient.
2.4.1. Finding slow code
We can use proc.time() like to so examine whole code blocks or individual lines in a function if we suspect a certain part of our code is running much slower than the rest. proc.time() allows us to "time" our code by calling it before and after our code blocks to calculate the actual runtime of our code as the difference between the proc.time() calls. As a simple example:
> startTime <- proc.time()
> n = 0
> for (i in seq(100, .01)) n = n + i
> endTime <- proc.time()
> endTime - startTime
user system elapsed
0.005 0.001 0.041
Note that the results are given in units of a second.
2.4.2. Writing faster code
Most of our inefficient code results from bad design. Make sure that your higher-order functions and algorithms are not making unnecessary calls and that you thoroughly understand what your code is doing. Try to preallocate calculations.
For more details and other issues, this stackoverflow post puts it better than I can.
3. Footnotes
1 An environment is essentially a space in which objects such as variables and functions are defined. The global environment is the the default environment and the outermost environment we work in. Anything defined or loaded outside of a function call exists in the global environment. Each time a function is called, a new environment called a "frame" is opened. Objects created inside a function call, including other functions, will be defined in the new frame. The environment or existing frame in which the new frame is opened is the new frame's "parent environment". Objects defined in the parent environment can be used in their child frames, but a child frame cannot redefine variables in the parent environment. The code below demonstrates what happens when one attempts to redefine a variable in the parent environment.
> a <- 1 # Not in function call, defined in global environment
> foo <- function() { # Creates a frame F1 inside the global environment. F1 can use anything defined in the global environment.
> a <- a + 1 # Defines the new variable a inside F1, not the parent environment.
> return(a)
> }
> foo()
[1] 2
> a # Note that a is not changed in the global environment.
[1] 1
If another function is called inside of the first function body, then a second frame is created such that the parent environment of the second frame is the first frame. This implies that the second function has access to any objects created in the first frame or the global environment. Any further nested functions behave similarly.
a <- 1 # a is defined in global environment
foo <- function() { # Creates a frame F1 inside of the global environment.
b <- a + 1 # b is defined in F1. foo can use variables defined in the global environment.
foobar <- function() { # Creates a frame F2 inside of F1.
c <- a + b # c is defined in F2. foobar can use variables in both F1 and the global environment.
return(c)
}
return(foobar())
}
When the function call terminates, the frame is closed and all the objects defined within it are discarded. Only the output of the function call (the return() call in a frame) is passed from the child frame to the parent environment. In the example above, b and c are discarded after calling foo. However, c is the output of foobar(), so a call to foo() would return the value of c, or 3.
The text was updated successfully, but these errors were encountered:
0. GitHub & RStudio setup and issues
Most common issues with Github and RStudio can be resolved after some googling, but there are times that we are unaware that there is an issue with our current setup. For the purposes of our project, we should at least be familiar and able to access the following list of Github/RStudio functions.
Let me know if any of these are unfamiliar to you.
1. Style guidelines
Here is a well written guide by google that accurately reflects the style conventions of the R community. It addresses most of the style "issues" in our script.
1.1. Reminders
Use
<-
to assign variables instead of=
. This is mostly convention, but there is a small technical difference. See: https://stackoverflow.com/questions/1741820/assignment-operators-in-r-and for more detail.Keep lines under 80 characters, with a few exceptions, like our
dfr()
call.If other researchers will be poring over our code, it would be helpful to have variable names that are both concise and descriptive. Our variable names are currently concise but also a bit esoteric to any outsider.
Names should use snake_case instead of camelCase, since many of our abstractions can best be described with acronyms. Examples:
foo_bar
,harderian_gland
,calculate_id
. Capitalized acronyms shall be used as convention in radiation research literature (e.g.nte_HZE_ider
,low_LET_ider
).2. General programming tips:
2.1. Environment Management
To clear the global environment (undefine all variables, functions, data, etc.) you can use:
rm(list=ls())
2.2. Debugging with breakpoints
2.2.1.
browser()
breakpointsTo better understand why your code may be raising error messages, you may add
browser()
to a new line above the code you suspect is buggy. When you run your code, the debugger will be raised at the line withbrowser()
and drop you into the current environment1 of the script. This means that everything that has been created or changed by the code up to that line will be available to you to view and call in the console. Withbrowser()
, you can easily check or test objects created in function environments. Hint: usestr()
to check the type of an object.For example, you can use
browser()
to check how the value of a variable changes in a loop. Consider the following function:Let's say you want to examine the behavior of
loop
. If you runloop
without thebrowser()
call in the third line, you simply get the output ofloop
. With thebrowser()
call, you can closely examine the environment ofloop
. Whenloop()
is called,browser()
drops you into the debugger when it is evaluated. If we check the value ofx
, we can see that it is 0.If we let the debugger continue to the next
browser()
(press the continue button or runc
) and check the value ofx
, then we can see thatx
is 1 in the second loop.We can continue to run the debugger to see how
x
changes asloop
runs.In this particular example,
loop
is clearly simple, but with more complicated functions or loops,browser()
can shed much more insight.2.2.2. Editor breakpoints
RStudio allows users to set breakpoints without changing the existing code by clicking directly to the left of a line of code. A red dot should appear. If a red circle appears instead, then the breakpoint is deferred. This can happen for a number of reasons, but saving the file or running
source()
in the console or with the editor should change the circle to a dot. Editor breakpoints, unlikebrowser()
breakpoints, can only be used withsource()
. They are generally less versatile thanbrowser()
.2.2.3. Debugger console
Running or sourcing code with active breakpoints will halt execution at the first encountered breakpoint. At this point, the console will display several new commands:
foo(x)
then the halted point of execution would be moved to the source code of 'foo'.The console also can run most
R
code, which is useful for checking the values of variables or writing test functions within the debugger.2.2.4. Additional resources
RStudio documentation for debugging resources can be found here.
2.3. Locating source code
Try
getAnywhere(function)
. However, it's usually more useful to step into source code when using breakpoints and the debugger, especially for complicated functions.2.4. Reducing runtime
Sometimes we find that we would like our programs to run faster. Here are various methods to locate and rewrite slow code to be more efficient.
2.4.1. Finding slow code
We can use
proc.time()
like to so examine whole code blocks or individual lines in a function if we suspect a certain part of our code is running much slower than the rest.proc.time()
allows us to "time" our code by calling it before and after our code blocks to calculate the actual runtime of our code as the difference between theproc.time()
calls. As a simple example:Note that the results are given in units of a second.
2.4.2. Writing faster code
Most of our inefficient code results from bad design. Make sure that your higher-order functions and algorithms are not making unnecessary calls and that you thoroughly understand what your code is doing. Try to preallocate calculations.
For more details and other issues, this stackoverflow post puts it better than I can.
3. Footnotes
1 An environment is essentially a space in which objects such as variables and functions are defined. The global environment is the the default environment and the outermost environment we work in. Anything defined or loaded outside of a function call exists in the global environment. Each time a function is called, a new environment called a "frame" is opened. Objects created inside a function call, including other functions, will be defined in the new frame. The environment or existing frame in which the new frame is opened is the new frame's "parent environment". Objects defined in the parent environment can be used in their child frames, but a child frame cannot redefine variables in the parent environment. The code below demonstrates what happens when one attempts to redefine a variable in the parent environment.
If another function is called inside of the first function body, then a second frame is created such that the parent environment of the second frame is the first frame. This implies that the second function has access to any objects created in the first frame or the global environment. Any further nested functions behave similarly.
When the function call terminates, the frame is closed and all the objects defined within it are discarded. Only the output of the function call (the
return()
call in a frame) is passed from the child frame to the parent environment. In the example above,b
andc
are discarded after callingfoo
. However,c
is the output offoobar()
, so a call tofoo()
would return the value ofc
, or 3.The text was updated successfully, but these errors were encountered: