index.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="generator" content="pandoc">
  <meta name="author" content="Eric Denovellis">
  <title>Better Science Code</title>
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
  <link rel="stylesheet" href="revealjs/css/reveal.css">
  <style type="text/css">code{white-space: pre;}</style>
  <style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
  margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; background-color: #303030; color: #cccccc; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; }
td.sourceCode { padding-left: 5px; }
pre, code { color: #cccccc; background-color: #303030; }
code > span.kw { color: #f0dfaf; } /* Keyword */
code > span.dt { color: #dfdfbf; } /* DataType */
code > span.dv { color: #dcdccc; } /* DecVal */
code > span.bn { color: #dca3a3; } /* BaseN */
code > span.fl { color: #c0bed1; } /* Float */
code > span.ch { color: #dca3a3; } /* Char */
code > span.st { color: #cc9393; } /* String */
code > span.co { color: #7f9f7f; } /* Comment */
code > span.ot { color: #efef8f; } /* Other */
code > span.al { color: #ffcfaf; } /* Alert */
code > span.fu { color: #efef8f; } /* Function */
code > span.er { color: #c3bf9f; } /* Error */
code > span.wa { color: #7f9f7f; font-weight: bold; } /* Warning */
code > span.cn { color: #dca3a3; font-weight: bold; } /* Constant */
code > span.sc { color: #dca3a3; } /* SpecialChar */
code > span.vs { color: #cc9393; } /* VerbatimString */
code > span.ss { color: #cc9393; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { } /* Variable */
code > span.cf { color: #f0dfaf; } /* ControlFlow */
code > span.op { color: #f0efd0; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #ffcfaf; font-weight: bold; } /* Preprocessor */
code > span.at { } /* Attribute */
code > span.do { color: #7f9f7f; } /* Documentation */
code > span.an { color: #7f9f7f; font-weight: bold; } /* Annotation */
code > span.cv { color: #7f9f7f; font-weight: bold; } /* CommentVar */
code > span.in { color: #7f9f7f; font-weight: bold; } /* Information */
  </style>
  <link rel="stylesheet" href="revealjs/css/theme/black.css" id="theme">
  <link rel="stylesheet" href="css/custom.css"/>
  <!-- Printing and PDF exports -->
  <script>
    var link = document.createElement( 'link' );
    link.rel = 'stylesheet';
    link.type = 'text/css';
    link.href = window.location.search.match( /print-pdf/gi ) ? 'revealjs/css/print/pdf.css' : 'revealjs/css/print/paper.css';
    document.getElementsByTagName( 'head' )[0].appendChild( link );
  </script>
  <!--[if lt IE 9]>
  <script src="revealjs/lib/js/html5shiv.js"></script>
  <![endif]-->
</head>
<body>
  <div class="reveal">
    <div class="slides">

<section>
  <h1 class="title">Better Science Code</h1>
  <p class="author">Eric Denovellis</p>
</section>

<section class="slide level6">

<p>Presentation: <a href="http://edeno.github.io/Better-Science-Code">https://edeno.github.io/Better-Science-Code</a></p>
</section>
<section class="slide level6">

<p>Repository: <a href="https://github.com/edeno/Better-Science-Code" class="uri">https://github.com/edeno/Better-Science-Code</a></p>
</section>
<section class="slide level6">

<p>Google Doc for Group Note Taking / Discussion:</p>
<p><a href="https://docs.google.com/document/d/1LDR8eF6rggOST7IuyM0qcXJhoLI6UwHaiwcwS1-RpPw/edit?usp=sharing" class="uri">https://docs.google.com/document/d/1LDR8eF6rggOST7IuyM0qcXJhoLI6UwHaiwcwS1-RpPw/edit?usp=sharing</a></p>
</section>
<section class="slide level6">

<p>Why should you care about producing good code</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 1. Doing good science!</p>
<aside class='notes'>
All modern science depends on computing (data-collection, analysis, computational modeling). We spend a lot of time designing and performing experiments. Why waste that effort by writing code with errors?
<aside

------------------

<span class='deemphasized-title'>
<p>Why should you care about producing good code</span></p>
<p>We want code that <span class="highlight">works</span> (it does what you say it does) and is <span class="highlight">reproducible</span> (you can get to the same result every time using the same data and code):</p>
</section>
<section class="slide level6">

<p>Don’t want to have to retract papers because the code had bugs</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>Following good coding practices reduces the chance of making mistakes.</p>
</section>
<section class="slide level6">

<p>IT’S TOO EASY TO MAKE MISTAKES</p>
</section>
<section class="slide level6">

<blockquote>
<p>“As the complexity of a software program increases, the likelihood of undiscovered bugs quickly reaches certainty” – <cite>Poldrack et al. 2017</cite></p>
</blockquote>
</section>
<section class="slide level6">

<p>We are writing <em>complex code</em></p>
<aside class="notes">
Good code should reduce your anxiety about making mistakes
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 2. Want to remember what the code does months later</p>
</section>
<section class="slide level6">

<blockquote>
<p>“The single biggest reason you should write nice code is so that your future self can understand it.” – <cite>Greg Wilson</cite></p>
</blockquote>
<blockquote>
<p>“All code has at least one collaborator and that is future you.” – <cite>Hadley Wickham</cite></p>
</blockquote>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 3. Want to be able to share it with other people</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code </span></p>
<p>REASON 4. Avoid introducing new errors</p>
<aside class="notes">
We’ll talk about how writing good code (in particular testing your code) helps you avoid introducing new errors into your code
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Why should you care about producing good code </span></p>
<p>REASON 5. Can serve as a resume for future employers</p>
</section>
<section class="slide level6">

<p>How to write good code???</p>
</section>
<section class="slide level6">

<p>Exercise in managing complexity:</p>
<ul>
<li>break problems down into smaller components</li>
<li>eliminate unnecessary dependencies</li>
<li>keep track of what you did (be organized)</li>
</ul>
</section>
<section class="slide level6">

<p>Goal: Want to form good habits</p>
</section>
<section class="slide level6">

<p>Don’t be overwhelmed <em>and not do any of these things</em></p>
</section>
<section class="slide level6">

<p>Don’t beat yourself up <em>if you don’t do all these things all the time</em></p>
<aside class="notes">
<ul>
<li>just try to remember them and incorporate them gradually into your process</li>
<li>it will slow your coding process initially, but you will gain precision, readability</li>
<li>some of these will require more inertia (such as version control)</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">How to write good code???</span></p>
<p>STEP 1. Decompose programs into small, well-defined functions</p>
<aside class="notes">
Biggest mistakes I see in scientific code. 1. Not writing functions at all. 2. Not writing small enough functions
</aside>
</section>
<section class="slide level6">

<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> bad_function():
    X <span class="op">=</span> np.load(<span class="st">&#39;/tmp/123.npy&#39;</span>, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
    z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
    Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
    z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
    Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)</code></pre></div>
<aside class="notes">
<ul>
<li><code>Def</code>: defines a function in python</li>
</ul>
</aside>
</section>
<section class="slide level6">

<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> better_function():
    y, x1, x2 <span class="op">=</span> load_data(<span class="st">&#39;/tmp/123.npy&#39;</span>)
    b1 <span class="op">=</span> linear_regression(zscore(x1), y)
    b2 <span class="op">=</span> linear_regression(zscore(x2), y)
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)

<span class="kw">def</span> load_data(data_name):
    X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    <span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]

<span class="kw">def</span> zscore(x):
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Try to keep functions to less than 60 lines (small)</p>
<aside class="notes">
Seeing a whole function on screen helps you keep it in your working memory.
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Try to keep what the function does as simple as possible (well-defined)</p>
<aside class="notes">
<p>atomic = a function should do one “thing”</p>
<p>Think about if you came back to the function later, how long would it take you to understand what it does? * should be able to explain what it does in one sentence</p>
pure = as few implicit contexts and side-effects as possible.
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Be ruthless about eliminating duplication of code.</p>
<aside class="notes">
<ul>
<li>turn duplicated code into functions</li>
<li>that way fixing a bug in your function, fixes it for every time the function is used instead of every separate instance</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Small, well-defined, without duplicates</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> bad_function():
    X <span class="op">=</span> np.load(<span class="st">&#39;/tmp/123.npy&#39;</span>, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
    z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
    Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
    z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
    Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Small, well-defined, without duplicates</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> better_function():
    y, x1, x2 <span class="op">=</span> load_data(<span class="st">&#39;/tmp/123.npy&#39;</span>)
    b1 <span class="op">=</span> linear_regression(zscore(x1), y)
    b2 <span class="op">=</span> linear_regression(zscore(x2), y)
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)

<span class="kw">def</span> load_data(data_name):
    X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    <span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]

<span class="kw">def</span> zscore(x):
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p>Small, well-defined functions are more <em>maintainable</em></p>
<aside class="notes">
<ul>
<li>breaks hard problems down into smaller problems</li>
<li>limits the scope of your code</li>
<li>makes it easier to debug or change (with unit testing)</li>
<li>separation of concerns</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>Small, well-defined functions are more <em>composable</em></p>
<aside class="notes">
<ul>
<li>can reuse function in other programs</li>
<li>can pass functions to other functions (function composition)</li>
<li>makes you more efficient because you don’t have to rewrite code</li>
<li>makes you more precise because you can focus on fixing bugs for one function, not many similar functions</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>Small, well-defined functions are more <em>readable</em></p>
<p>* if you give them good names</p>
</section>
<section class="slide level6">

<p>STEP 2. Use good variable/function names to clarify what things do</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> bad_function():
    X <span class="op">=</span> np.load(<span class="st">&#39;/tmp/123.npy&#39;</span>, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
    z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
    Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
    z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
    Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> better_function():
    y, x1, x2 <span class="op">=</span> load_data(<span class="st">&#39;/tmp/123.npy&#39;</span>)
    b1 <span class="op">=</span> linear_regression(zscore(x1), y)
    b2 <span class="op">=</span> linear_regression(zscore(x2), y)
    b <span class="op">=</span> b1 <span class="op">-</span> b2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, b)

<span class="kw">def</span> load_data(data_name):
    X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    <span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]

<span class="kw">def</span> zscore(x):
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> better_function():
    response, design_matrix1, design_matrix2 <span class="op">=</span> load_data(
        <span class="st">&#39;/tmp/123.npy&#39;</span>)
    coefficient1 <span class="op">=</span> linear_regression(
        zscore(design_matrix1), response)
    coefficient2 <span class="op">=</span> linear_regression(
        zscore(design_matrix2), response)
    coefficient_difference <span class="op">=</span> coefficient1 <span class="op">-</span> coefficient2
    np.save(<span class="st">&#39;ans.npy&#39;</span>, coefficient_difference)

<span class="kw">def</span> load_data(data_name):
    X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">&#39;r&#39;</span>)
    <span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]

<span class="kw">def</span> zscore(x):
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p>You don’t need comments if the variable or function already tells you what it does (self-documenting)</p>
<aside class="notes">
<ul>
<li>People have been taught to use comments in their code</li>
<li>Modern practice is to use commenting sparingly within the body of the code</li>
<li>Use comments to document what the functions does at the beginning of the function (will come back to this)</li>
<li>Doesn’t mean never use comments, but don’t use them to restate what the code already says.</li>
<li>“If your code needs a comment to explain it, you’ve probably written confusing code.”</li>
<li>Makes it easier to read</li>
<li>When it is difficult to come up with a meaningful name for the function (It is probably doing too much)</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>Use the naming conventions of your language of choice (<code>snake_case</code> or <code>camelCase</code>) and <span class="highlight">be consistent</span></p>
</section>
<section class="slide level6">

<p>Avoid using abbreviations that are not commonly used</p>
<p>(<code>sw</code> vs. <code>spike_width</code>)</p>
</section>
<section class="slide level6">

<p>Prefer whole words</p>
<p>(<code>elec_poten</code> vs. <code>electric_potential</code>)</p>
</section>
<section class="slide level6">

<p>STEP 3. Document your functions</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<p>Easy thing: brief sentence describing the function without using the name of the function*</p>
<p>*<em>this is the most important</em></p>
<aside class="notes">
<ul>
<li>second line of defense in remembering what a function does</li>
<li>The more important the function, the more it should be documented</li>
<li>if using python, use the numpy format</li>
<li>if using matlab, use the matlab format</li>
<li>documentation often longer than the code itself</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
    <span class="co">&#39;&#39;&#39;Number of standard deviations from the mean&#39;&#39;&#39;</span>
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
    <span class="co">&#39;&#39;&#39;Number of standard deviations from the mean&#39;&#39;&#39;</span>
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> linear_regression(design_matrix, response):
    <span class="co">&#39;&#39;&#39;Calculate a linear least-squares regression for</span>
<span class="co">    two sets of measurements&#39;&#39;&#39;</span>
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<ul>
<li>additional detail about what the function does or method it implements</li>
<li>description of the parameters</li>
<li>description of the outputs</li>
<li>examples if you can</li>
</ul>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> linear_regression(design_matrix, response):
    <span class="co">&#39;&#39;&#39;Calculate a linear least-squares regression for</span>
<span class="co">    two sets of measurements</span>

<span class="co">    Uses the QR decomposition to avoid numerical instability</span>
<span class="co">    in taking the inverse.</span>

<span class="co">    Parameters</span>
<span class="co">    ----------</span>
<span class="co">    design_matrix, response : array_like</span>
<span class="co">        Two sets of measurements. Both arrays should have</span>
<span class="co">        the same length.</span>

<span class="co">    Returns</span>
<span class="co">    -------</span>
<span class="co">    coefficients : array_like</span>
<span class="co">        Parameters estimated from the model.</span>

<span class="co">    Examples</span>
<span class="co">    --------</span>
<span class="co">    &gt;&gt;&gt; design_matrix = np.random.random(10)</span>
<span class="co">    &gt;&gt;&gt; response = np.random.random(10)</span>
<span class="co">    &gt;&gt;&gt; coefficients = linear_regression(design_matrix, response)</span>

<span class="co">    &#39;&#39;&#39;</span>
    Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">&#39;reduced&#39;</span>)
    <span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">

<p>STEP 4. Test your code</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<p>Make sure your code works like you think it does</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<p>Think about how your code can fail</p>
</section>
<section class="slide level6">

<p>Small, well-defined, well-named functions are easy to test!</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> zscore(x):
    <span class="co">&#39;&#39;&#39;Number of standard deviations from the mean&#39;&#39;&#39;</span>
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> test_zscore():
    <span class="cf">pass</span></code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np

<span class="kw">def</span> zscore(x):
    <span class="co">&#39;&#39;&#39;Number of standard deviations from the mean&#39;&#39;&#39;</span>
    <span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()

<span class="kw">def</span> test_zscore():
    test_values <span class="op">=</span> np.asarray([<span class="dv">1</span>, <span class="dv">3</span>])
    expected_values <span class="op">=</span> np.asarray([<span class="op">-</span><span class="dv">1</span>, <span class="dv">1</span>])

    <span class="cf">assert</span> np.allclose(zscore(test_values), expected_values)</code></pre></div>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<p><span class="highlight">Unit tests</span> test a small component of your code (usually a small function) and makes sure it works like you think it works</p>
<aside class="notes">
<ul>
<li>Isolate small components of program and make sure they are correct</li>
<li>doesn’t ensure that combinations of these functions work (integration testing)</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="highlight">Unit tests</span> prevent regression of your code</p>
</section>
<section class="slide level6">

<p>If you change your code, you want to know what still works and what has broken (Regression)</p>
</section>
<section class="slide level6">

<p>Functions should be simple to test</p>
<aside class="notes">
<ul>
<li>if the number of test cases is uncomfortably large, start looking for smaller units to test.</li>
<li>your function is probably too complex</li>
<li>After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>If you find a bug, write a test.</p>
<aside class="notes">
After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.
</aside>
</section>
<section class="slide level6">

<p>Use unit tests to define the requirements of your code</p>
<aside class="notes">
<ul>
<li>ensure that your function is well-defined</li>
<li>some people even write unit tests before writing a function (test-driven development)</li>
<li>also a form of documentation: examples for how you think your code should work</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>You can use programs called <span class="highlight">test runners</span> to run a group of unit tests automatically.</p>
</section>
<section class="slide level6">

<p>Matlab, Python, R have unit test packages</p>
<ul>
<li><a href="https://www.mathworks.com/help/matlab/matlab-unit-test-framework.html">Matlab unit test framework</a></li>
<li><a href="https://docs.python.org/3.4/library/unittest.html">Python unit test</a></li>
<li><a href="http://doc.pytest.org/en/latest/">Pytest</a></li>
<li><a href="https://github.com/hadley/testthat">R: testthat</a></li>
</ul>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Test your code</span></p>
<p>There are also libraries available that will work with your version control system to run these tests every time you commit a new piece of code (<span class='highlight'>continuous integration<span>)</p>
<aside class="notes">
<ul>
<li>This all seems complicated but in the process of developing code, you should be writing tests to make sure it works. This process just formalizes the writing of tests and allows you to run them at a later time, ensuring peace of mind.</li>
<li>yields more predictable code</li>
<li>in order to write a test, you have to know what the function does</li>
<li>people can look at your tests to understand your code (form of documentation)</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p>STEP 5. Use version control</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Sophisticated way to track change in your code over time</p>
<aside class="notes">
<ul>
<li>dropbox is a form of this (but not very sophisticated)</li>
<li>microsoft word is also a form of this (but not very sophisticated)</li>
<li>snapshots of all the files in a folder (repository)</li>
<li>git is the most popular (some time is needed to learn this, but social/collaborative/popularity make it worth it)</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<figure>
<img src="img/github-desktop.png" alt="Github Desktop" /><figcaption>Github Desktop</figcaption>
</figure>
</section>
<section class="slide level6">

<p>Version control stores the whole history of your project</p>
</section>
<section class="slide level6">

<figure>
<img src="img/commit-history.png" />
</figure>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Helps you back up your work</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Go back to previous versions of your code</p>
</section>
<section class="slide level6">

<figure>
<img src="img/commit-history.png" />
</figure>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Reduce code clutter and confusion</p>
<aside class="notes">
<ul>
<li>no more code_v1.m, code_v2.m</li>
<li>which version of code was I using???</li>
<li>which version of code worked???</li>
<li>how is this different from other code I wrote???</li>
</ul>
</aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Experiment with different versions of code (branches)</p>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Makes it easier to work with others</p>
<aside class="notes">
<ul>
<li>standardized way of not unintentionally overwriting each others code</li>
<li>easy to share code (GitHub, Bitbucket, etc)</li>
<li>makes it easier to document issues with code or data</li>
<li>Use example from this presentation
<aside></li>
</ul>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Use version control</span></p>
<p>Commit early and often (take a lot of snapshots of your code)</p>
<aside class="notes">
<ul>
<li>when you get a piece of code working, commit it (take a snapshot)</li>
<li>Leave a short informative commit message (document what the commit is)</li>
<li>don’t comment out code, just remove it, you can get back</li>
<li>I personally use GitHub Desktop
<ul>
<li>easy to use user interface
<aside></li>
</ul></li>
</ul>
</section>
<section class="slide level6">

<p>STEP 6. Refactor your code</p>
</section>
<section class="slide level6">

<blockquote>
<p>“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.” – <cite>Martin Fowler, Refactoring: Improving the Design of Existing Code</cite></p>
</blockquote>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">Refactor your code</span></p>
<p>Always leave the code in a better state than when you first found it.</p>
<aside class="notes">
<p>Your code isn’t going to be perfect the first time</p>
<p>Just like in writing, your code will get better as you revise it.</p>
<p>You wouldn’t expect a first draft to be perfect.</p>
<p>each time you look at your code: * do my variable/function names make sense? * do I know what this function is doing? * can I turn things into functions? * can I generalize this function?</p>
<p>There is some tradeoff between tinkering with your code and getting things done</p>
Also don’t throw everything out and re-write from scratch unless you can absolutely help it * “When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes.” If tempted by this tutorial to do this to your existing codebase, don’t
<aside>
</section>
<section class="slide level6">

<p>STEP 7. Always search for well-maintained software libraries that do what you need.</p>
</section>
<section class="slide level6">

<p>Don’t rewrite functions that are already implemented as part of the core language.</p>
</section>
<section class="slide level6">

<p>Use other software libraries if they are well-maintained</p>
<aside class="notes">
<p>Why: * because more users mean less bugs * better tested</p>
Little tricky: still need to take time to vet the code to make sure it does what you think it does
<aside>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Exercise in managing complexity:</p>
<ul>
<li>break problems down into smaller components</li>
<li>eliminate unnecessary dependencies</li>
<li>keep track of what you did (be organized)</li>
</ul>
</section>
<section class="slide level6">

<p>Summary:</p>
<ol type="1">
<li>Write small well-defined, well-named functions</li>
<li>Use good function and variable names</li>
<li>Document your functions</li>
<li>Test your code</li>
<li>Refactor your code</li>
<li>Use version control</li>
<li>Always search for well-maintained software libraries that do what you need.</li>
</ol>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">break problems down into smaller components</span></p>
<ol type="1">
<li>Write small well-defined, well-named functions</li>
<li><span class="dim">Use good function and variable names</span></li>
<li><span class="dim">Document your functions</span></li>
<li><span class="dim">Test your code</span></li>
<li>Refactor your code</li>
<li><span class="dim">Use version control</span></li>
<li>Always search for well-maintained software libraries that do what you need.</li>
</ol>
</section>
<section class="slide level6">

<p><span class="deemphasized-title">keep track of what you did (be organized)</span></p>
<ol type="1">
<li><span class="dim">Write small well-defined, well-named functions</span></li>
<li>Use good function and variable names</li>
<li>Document your functions</li>
<li>Test your code</li>
<li><span class="dim">Refactor your code</span></li>
<li>Use version control</li>
<li><span class="dim">Always search for well-maintained software libraries that do what you need.</span></li>
</ol>
</section>
<section class="slide level6">

<p>Conclusion: Writing good code takes work</p>
</section>
<section class="slide level6">

<p>We have a scientific obligation to ensure the correctness of our programs.</p>
<aside class="notes">
<p>I think it is a mistake to think that only “programmers” working for companies need to bother with writing good code.</p>
<p>You are a programmer dealing with complex programs.</p>
Need to put the same amount of effort as performing the experiment or writing the paper.
</aside>
</section>
<section class="slide level6">

<p>Exercises</p>
<ul>
<li><p>Go to <a href="https://github.com/edeno/Better-Science-Code" class="uri">https://github.com/edeno/Better-Science-Code</a></p></li>
<li><p>Copy either <a href="https://raw.githubusercontent.com/edeno/Better-Science-Code/master/exercises/exercises.py">exercises.py</a> or <a href="https://raw.githubusercontent.com/edeno/Better-Science-Code/master/exercises/exercises.m">exercises.m</a></p></li>
<li><p>Work on for 30 minutes (either solo or in groups).</p></li>
<li><p>Code Review: We will discuss what people came up with</p></li>
</ul>
</section>
<section class="slide level6">

<p>Exercise Objectives</p>
</section>
<section class="slide level6">

<p>Bonus: Data Management</p>
</section>
<section class="slide level6">

<p>Put different projects in different folders/repositories</p>
</section>
<section class="slide level6">

<p>Use relative paths</p>
</section>
<section class="slide level6">

<p>Separate the data from the code</p>
</section>
<section class="slide level6">

<p>Processed Data should be separated from Raw Data to avoid accidentally changing the data</p>
</section>
<section class="slide level6">

<p>Tidy Data:</p>
<ul>
<li>Each variable forms a column.</li>
<li>Each observation forms a row.</li>
<li>Each type of observational unit forms a table</li>
<li>flat is better than nested</li>
</ul>
</section>
<section class="slide level6">

<p>If original data is not in a good form, convert it to a good form (but don’t overwrite the original data)</p>
</section>
<section class="slide level6">

<p>Don’t hand-edit data files.</p>
</section>
<section class="slide level6">

<p>All aspects of data cleaning should be in scripts</p>
</section>
<section class="slide level6">

<p>File naming:</p>
<ul>
<li>Don’t use spaces in file names</li>
<li>Use leading zeros (001 vs. 1)</li>
</ul>
</section>
    </div>
  </div>

  <script src="revealjs/lib/js/head.min.js"></script>
  <script src="revealjs/js/reveal.js"></script>

  <script>

      // Full list of configuration options available at:
      // https://github.com/hakimel/reveal.js#configuration
      Reveal.initialize({
        // Display controls in the bottom right corner
        controls: false,
        // Display the page number of the current slide
        slideNumber: "c/t",
        // Push each slide change to the browser history
        history: true,
        // Transition style
        transition: 'none', // none/fade/slide/convex/concave/zoom

        // Optional reveal.js plugins
        dependencies: [
          { src: 'revealjs/lib/js/classList.js', condition: function() { return !document.body.classList; } },
          { src: 'revealjs/plugin/zoom-js/zoom.js', async: true },
              { src: 'revealjs/plugin/notes/notes.js', async: true }
        ]
      });
    </script>
    </body>
</html>