Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style and General Tips #4

Open
eghuang opened this issue Aug 22, 2017 · 2 comments
Open

Style and General Tips #4

eghuang opened this issue Aug 22, 2017 · 2 comments
Assignees
Labels
reference Point of information for collaborators.

Comments

@eghuang
Copy link
Member

eghuang commented Aug 22, 2017

0. GitHub & RStudio setup and issues

Most common issues with Github and RStudio can be resolved after some googling, but there are times that we are unaware that there is an issue with our current setup. For the purposes of our project, we should at least be familiar and able to access the following list of Github/RStudio functions.

  • Having an RStudio project created for this repository (aka. repo)
  • Committing changes to a GitHub repository from RStudio
  • Pushing to a GitHub repo from RStudio
  • Pulling from a GitHub repo to RStudio
  • Merging files after a push or pull on RStudio
  • Managing repo files on RStudio
  • Having email notifications for changes made to this repo

Let me know if any of these are unfamiliar to you.

1. Style guidelines

Here is a well written guide by google that accurately reflects the style conventions of the R community. It addresses most of the style "issues" in our script.

1.1. Reminders

  • Use <- to assign variables instead of =. This is mostly convention, but there is a small technical difference. See: https://stackoverflow.com/questions/1741820/assignment-operators-in-r-and for more detail.

  • Keep lines under 80 characters, with a few exceptions, like our dfr() call.

  • If other researchers will be poring over our code, it would be helpful to have variable names that are both concise and descriptive. Our variable names are currently concise but also a bit esoteric to any outsider.

  • Names should use snake_case instead of camelCase, since many of our abstractions can best be described with acronyms. Examples: foo_bar, harderian_gland, calculate_id. Capitalized acronyms shall be used as convention in radiation research literature (e.g. nte_HZE_ider, low_LET_ider).

2. General programming tips:

2.1. Environment Management

To clear the global environment (undefine all variables, functions, data, etc.) you can use:

rm(list=ls())

2.2. Debugging with breakpoints

2.2.1. browser() breakpoints

To better understand why your code may be raising error messages, you may add browser() to a new line above the code you suspect is buggy. When you run your code, the debugger will be raised at the line with browser() and drop you into the current environment1 of the script. This means that everything that has been created or changed by the code up to that line will be available to you to view and call in the console. With browser(), you can easily check or test objects created in function environments. Hint: use str() to check the type of an object.

For example, you can use browser() to check how the value of a variable changes in a loop. Consider the following function:

loop <- function(x) {
  for (i in seq(100)) {
    browser()
    x <- x + 1
  }
  return(x)
}

Let's say you want to examine the behavior of loop. If you run loop without the browser() call in the third line, you simply get the output of loop. With the browser() call, you can closely examine the environment of loop. When loop() is called, browser() drops you into the debugger when it is evaluated. If we check the value of x, we can see that it is 0.

> loop(0)
Called from: loop(0)
Browse[1]> x
[1] 0

If we let the debugger continue to the next browser() (press the continue button or run c) and check the value of x, then we can see that x is 1 in the second loop.

Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 1

We can continue to run the debugger to see how x changes as loop runs.

Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 2

In this particular example, loop is clearly simple, but with more complicated functions or loops, browser() can shed much more insight.

2.2.2. Editor breakpoints

RStudio allows users to set breakpoints without changing the existing code by clicking directly to the left of a line of code. A red dot should appear. If a red circle appears instead, then the breakpoint is deferred. This can happen for a number of reasons, but saving the file or running source() in the console or with the editor should change the circle to a dot. Editor breakpoints, unlike browser() breakpoints, can only be used with source(). They are generally less versatile than browser().

2.2.3. Debugger console

Running or sourcing code with active breakpoints will halt execution at the first encountered breakpoint. At this point, the console will display several new commands:

  • Next: Executes the next line of code without leaving the debugger.
  • Step into (down arrow): If possible, moves into the source code of the frozen line. For example, if execution is halted at foo(x) then the halted point of execution would be moved to the source code of 'foo'.
  • Step out (right arrow): If possible, moves out out the current code. For example, this would undo a preceding Step into command.
  • Continue: Run code until the next breakpoint or until all code is executed.
  • Stop: Exit the debugger

The console also can run most R code, which is useful for checking the values of variables or writing test functions within the debugger.

2.2.4. Additional resources

RStudio documentation for debugging resources can be found here.

2.3. Locating source code

Try getAnywhere(function). However, it's usually more useful to step into source code when using breakpoints and the debugger, especially for complicated functions.

2.4. Reducing runtime

Sometimes we find that we would like our programs to run faster. Here are various methods to locate and rewrite slow code to be more efficient.

2.4.1. Finding slow code

We can use proc.time() like to so examine whole code blocks or individual lines in a function if we suspect a certain part of our code is running much slower than the rest. proc.time() allows us to "time" our code by calling it before and after our code blocks to calculate the actual runtime of our code as the difference between the proc.time() calls. As a simple example:

> startTime <- proc.time()
> n = 0
> for (i in seq(100, .01)) n = n + i
> endTime <- proc.time()
> endTime - startTime

   user  system elapsed 
  0.005   0.001   0.041 

Note that the results are given in units of a second.

2.4.2. Writing faster code

Most of our inefficient code results from bad design. Make sure that your higher-order functions and algorithms are not making unnecessary calls and that you thoroughly understand what your code is doing. Try to preallocate calculations.

For more details and other issues, this stackoverflow post puts it better than I can.

3. Footnotes

1 An environment is essentially a space in which objects such as variables and functions are defined. The global environment is the the default environment and the outermost environment we work in. Anything defined or loaded outside of a function call exists in the global environment. Each time a function is called, a new environment called a "frame" is opened. Objects created inside a function call, including other functions, will be defined in the new frame. The environment or existing frame in which the new frame is opened is the new frame's "parent environment". Objects defined in the parent environment can be used in their child frames, but a child frame cannot redefine variables in the parent environment. The code below demonstrates what happens when one attempts to redefine a variable in the parent environment.

> a <- 1 #  Not in function call, defined in global environment
> foo <- function() { #  Creates a frame F1 inside the global environment. F1 can use anything defined in the global environment.
>   a <- a + 1 #  Defines the new variable a inside F1, not the parent environment.
>   return(a)
> }

> foo() 
[1] 2

> a #  Note that a is not changed in the global environment.
[1] 1

If another function is called inside of the first function body, then a second frame is created such that the parent environment of the second frame is the first frame. This implies that the second function has access to any objects created in the first frame or the global environment. Any further nested functions behave similarly.

a <- 1 #  a is defined in global environment
foo <- function() { #  Creates a frame F1 inside of the global environment.
  b <- a + 1 #  b is defined in F1. foo can use variables defined in the global environment.
  foobar <- function() { #  Creates a frame F2 inside of F1. 
    c <- a + b #  c is defined in F2. foobar can use variables in both F1 and the global environment.
    return(c)
    }
  return(foobar())
}

When the function call terminates, the frame is closed and all the objects defined within it are discarded. Only the output of the function call (the return() call in a frame) is passed from the child frame to the parent environment. In the example above, b and c are discarded after calling foo. However, c is the output of foobar(), so a call to foo() would return the value of c, or 3.

@rainersachs
Copy link
Collaborator

thanks Edward. I read the Google guide and your comments; I started to implement them. But in some cases it didn't work yet.

@eghuang
Copy link
Member Author

eghuang commented Sep 6, 2017

September 6, 2017:

  • added explanation of browser() and examples
  • added brief review of environments and frames

Other Updates (most recent at bottom):

  • added explanation of proc.time() for timing code
  • linked guide to speeding up R code
  • added list of basic GitHub/RStudio functions we need to know
  • added bullet to reflect new naming conventions (snake_case)
    Let me know if you have any specific questions or code issues.
  • Added naming convention exception for common acronyms (HZE, LET).

May 29, 2019:

  • added editor breakpoints
  • added debugger console
  • added source code
  • minor changes in formatting and wording to other sections

@eghuang eghuang changed the title style and general tips Style and General Tips Feb 26, 2018
@eghuang eghuang added the reference Point of information for collaborators. label Feb 26, 2018
@eghuang eghuang self-assigned this Feb 26, 2018
@eghuang eghuang pinned this issue May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference Point of information for collaborators.
Projects
None yet
Development

No branches or pull requests

2 participants