-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathindex.html
948 lines (811 loc) · 40.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="author" content="Eric Denovellis">
<title>Better Science Code</title>
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="revealjs/css/reveal.css">
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; background-color: #303030; color: #cccccc; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; }
td.sourceCode { padding-left: 5px; }
pre, code { color: #cccccc; background-color: #303030; }
code > span.kw { color: #f0dfaf; } /* Keyword */
code > span.dt { color: #dfdfbf; } /* DataType */
code > span.dv { color: #dcdccc; } /* DecVal */
code > span.bn { color: #dca3a3; } /* BaseN */
code > span.fl { color: #c0bed1; } /* Float */
code > span.ch { color: #dca3a3; } /* Char */
code > span.st { color: #cc9393; } /* String */
code > span.co { color: #7f9f7f; } /* Comment */
code > span.ot { color: #efef8f; } /* Other */
code > span.al { color: #ffcfaf; } /* Alert */
code > span.fu { color: #efef8f; } /* Function */
code > span.er { color: #c3bf9f; } /* Error */
code > span.wa { color: #7f9f7f; font-weight: bold; } /* Warning */
code > span.cn { color: #dca3a3; font-weight: bold; } /* Constant */
code > span.sc { color: #dca3a3; } /* SpecialChar */
code > span.vs { color: #cc9393; } /* VerbatimString */
code > span.ss { color: #cc9393; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { } /* Variable */
code > span.cf { color: #f0dfaf; } /* ControlFlow */
code > span.op { color: #f0efd0; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #ffcfaf; font-weight: bold; } /* Preprocessor */
code > span.at { } /* Attribute */
code > span.do { color: #7f9f7f; } /* Documentation */
code > span.an { color: #7f9f7f; font-weight: bold; } /* Annotation */
code > span.cv { color: #7f9f7f; font-weight: bold; } /* CommentVar */
code > span.in { color: #7f9f7f; font-weight: bold; } /* Information */
</style>
<link rel="stylesheet" href="revealjs/css/theme/black.css" id="theme">
<link rel="stylesheet" href="css/custom.css"/>
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'revealjs/css/print/pdf.css' : 'revealjs/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="revealjs/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<h1 class="title">Better Science Code</h1>
<p class="author">Eric Denovellis</p>
</section>
<section class="slide level6">
<p>Presentation: <a href="http://edeno.github.io/Better-Science-Code">https://edeno.github.io/Better-Science-Code</a></p>
</section>
<section class="slide level6">
<p>Repository: <a href="https://github.com/edeno/Better-Science-Code" class="uri">https://github.com/edeno/Better-Science-Code</a></p>
</section>
<section class="slide level6">
<p>Google Doc for Group Note Taking / Discussion:</p>
<p><a href="https://docs.google.com/document/d/1LDR8eF6rggOST7IuyM0qcXJhoLI6UwHaiwcwS1-RpPw/edit?usp=sharing" class="uri">https://docs.google.com/document/d/1LDR8eF6rggOST7IuyM0qcXJhoLI6UwHaiwcwS1-RpPw/edit?usp=sharing</a></p>
</section>
<section class="slide level6">
<p>Why should you care about producing good code</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 1. Doing good science!</p>
<aside class='notes'>
All modern science depends on computing (data-collection, analysis, computational modeling). We spend a lot of time designing and performing experiments. Why waste that effort by writing code with errors?
<aside
------------------
<span class='deemphasized-title'>
<p>Why should you care about producing good code</span></p>
<p>We want code that <span class="highlight">works</span> (it does what you say it does) and is <span class="highlight">reproducible</span> (you can get to the same result every time using the same data and code):</p>
</section>
<section class="slide level6">
<p>Don’t want to have to retract papers because the code had bugs</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>Following good coding practices reduces the chance of making mistakes.</p>
</section>
<section class="slide level6">
<p>IT’S TOO EASY TO MAKE MISTAKES</p>
</section>
<section class="slide level6">
<blockquote>
<p>“As the complexity of a software program increases, the likelihood of undiscovered bugs quickly reaches certainty” – <cite>Poldrack et al. 2017</cite></p>
</blockquote>
</section>
<section class="slide level6">
<p>We are writing <em>complex code</em></p>
<aside class="notes">
Good code should reduce your anxiety about making mistakes
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 2. Want to remember what the code does months later</p>
</section>
<section class="slide level6">
<blockquote>
<p>“The single biggest reason you should write nice code is so that your future self can understand it.” – <cite>Greg Wilson</cite></p>
</blockquote>
<blockquote>
<p>“All code has at least one collaborator and that is future you.” – <cite>Hadley Wickham</cite></p>
</blockquote>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code</span></p>
<p>REASON 3. Want to be able to share it with other people</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code </span></p>
<p>REASON 4. Avoid introducing new errors</p>
<aside class="notes">
We’ll talk about how writing good code (in particular testing your code) helps you avoid introducing new errors into your code
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Why should you care about producing good code </span></p>
<p>REASON 5. Can serve as a resume for future employers</p>
</section>
<section class="slide level6">
<p>How to write good code???</p>
</section>
<section class="slide level6">
<p>Exercise in managing complexity:</p>
<ul>
<li>break problems down into smaller components</li>
<li>eliminate unnecessary dependencies</li>
<li>keep track of what you did (be organized)</li>
</ul>
</section>
<section class="slide level6">
<p>Goal: Want to form good habits</p>
</section>
<section class="slide level6">
<p>Don’t be overwhelmed <em>and not do any of these things</em></p>
</section>
<section class="slide level6">
<p>Don’t beat yourself up <em>if you don’t do all these things all the time</em></p>
<aside class="notes">
<ul>
<li>just try to remember them and incorporate them gradually into your process</li>
<li>it will slow your coding process initially, but you will gain precision, readability</li>
<li>some of these will require more inertia (such as version control)</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">How to write good code???</span></p>
<p>STEP 1. Decompose programs into small, well-defined functions</p>
<aside class="notes">
Biggest mistakes I see in scientific code. 1. Not writing functions at all. 2. Not writing small enough functions
</aside>
</section>
<section class="slide level6">
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> bad_function():
X <span class="op">=</span> np.load(<span class="st">'/tmp/123.npy'</span>, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)</code></pre></div>
<aside class="notes">
<ul>
<li><code>Def</code>: defines a function in python</li>
</ul>
</aside>
</section>
<section class="slide level6">
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> better_function():
y, x1, x2 <span class="op">=</span> load_data(<span class="st">'/tmp/123.npy'</span>)
b1 <span class="op">=</span> linear_regression(zscore(x1), y)
b2 <span class="op">=</span> linear_regression(zscore(x2), y)
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)
<span class="kw">def</span> load_data(data_name):
X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
<span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
<span class="kw">def</span> zscore(x):
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Try to keep functions to less than 60 lines (small)</p>
<aside class="notes">
Seeing a whole function on screen helps you keep it in your working memory.
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Try to keep what the function does as simple as possible (well-defined)</p>
<aside class="notes">
<p>atomic = a function should do one “thing”</p>
<p>Think about if you came back to the function later, how long would it take you to understand what it does? * should be able to explain what it does in one sentence</p>
pure = as few implicit contexts and side-effects as possible.
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Be ruthless about eliminating duplication of code.</p>
<aside class="notes">
<ul>
<li>turn duplicated code into functions</li>
<li>that way fixing a bug in your function, fixes it for every time the function is used instead of every separate instance</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Small, well-defined, without duplicates</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> bad_function():
X <span class="op">=</span> np.load(<span class="st">'/tmp/123.npy'</span>, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Small, well-defined, without duplicates</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> better_function():
y, x1, x2 <span class="op">=</span> load_data(<span class="st">'/tmp/123.npy'</span>)
b1 <span class="op">=</span> linear_regression(zscore(x1), y)
b2 <span class="op">=</span> linear_regression(zscore(x2), y)
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)
<span class="kw">def</span> load_data(data_name):
X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
<span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
<span class="kw">def</span> zscore(x):
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p>Small, well-defined functions are more <em>maintainable</em></p>
<aside class="notes">
<ul>
<li>breaks hard problems down into smaller problems</li>
<li>limits the scope of your code</li>
<li>makes it easier to debug or change (with unit testing)</li>
<li>separation of concerns</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>Small, well-defined functions are more <em>composable</em></p>
<aside class="notes">
<ul>
<li>can reuse function in other programs</li>
<li>can pass functions to other functions (function composition)</li>
<li>makes you more efficient because you don’t have to rewrite code</li>
<li>makes you more precise because you can focus on fixing bugs for one function, not many similar functions</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>Small, well-defined functions are more <em>readable</em></p>
<p>* if you give them good names</p>
</section>
<section class="slide level6">
<p>STEP 2. Use good variable/function names to clarify what things do</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> bad_function():
X <span class="op">=</span> np.load(<span class="st">'/tmp/123.npy'</span>, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
y, x1, x2 <span class="op">=</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
z1 <span class="op">=</span> (x1 <span class="op">-</span> x1.mean()) <span class="op">/</span> x1.std()
Q1, R1 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b1 <span class="op">=</span> np.linalg.solve(R1, np.dot(Q1.T, y1))
z2 <span class="op">=</span> (x2 <span class="op">-</span> x2.mean()) <span class="op">/</span> x2.std()
Q2, R2 <span class="op">=</span> np.linalg.qr(z1, mode<span class="op">=</span><span class="st">'reduced'</span>)
b2 <span class="op">=</span> np.linalg.solve(R2, np.dot(Q2.T, y2))
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> better_function():
y, x1, x2 <span class="op">=</span> load_data(<span class="st">'/tmp/123.npy'</span>)
b1 <span class="op">=</span> linear_regression(zscore(x1), y)
b2 <span class="op">=</span> linear_regression(zscore(x2), y)
b <span class="op">=</span> b1 <span class="op">-</span> b2
np.save(<span class="st">'ans.npy'</span>, b)
<span class="kw">def</span> load_data(data_name):
X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
<span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
<span class="kw">def</span> zscore(x):
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use good variable/function names</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> better_function():
response, design_matrix1, design_matrix2 <span class="op">=</span> load_data(
<span class="st">'/tmp/123.npy'</span>)
coefficient1 <span class="op">=</span> linear_regression(
zscore(design_matrix1), response)
coefficient2 <span class="op">=</span> linear_regression(
zscore(design_matrix2), response)
coefficient_difference <span class="op">=</span> coefficient1 <span class="op">-</span> coefficient2
np.save(<span class="st">'ans.npy'</span>, coefficient_difference)
<span class="kw">def</span> load_data(data_name):
X <span class="op">=</span> np.load(data_name, mmap_mode<span class="op">=</span><span class="st">'r'</span>)
<span class="cf">return</span> X[:, <span class="dv">0</span>], X[:, <span class="dv">1</span>], X[:, <span class="dv">2</span>]
<span class="kw">def</span> zscore(x):
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p>You don’t need comments if the variable or function already tells you what it does (self-documenting)</p>
<aside class="notes">
<ul>
<li>People have been taught to use comments in their code</li>
<li>Modern practice is to use commenting sparingly within the body of the code</li>
<li>Use comments to document what the functions does at the beginning of the function (will come back to this)</li>
<li>Doesn’t mean never use comments, but don’t use them to restate what the code already says.</li>
<li>“If your code needs a comment to explain it, you’ve probably written confusing code.”</li>
<li>Makes it easier to read</li>
<li>When it is difficult to come up with a meaningful name for the function (It is probably doing too much)</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>Use the naming conventions of your language of choice (<code>snake_case</code> or <code>camelCase</code>) and <span class="highlight">be consistent</span></p>
</section>
<section class="slide level6">
<p>Avoid using abbreviations that are not commonly used</p>
<p>(<code>sw</code> vs. <code>spike_width</code>)</p>
</section>
<section class="slide level6">
<p>Prefer whole words</p>
<p>(<code>elec_poten</code> vs. <code>electric_potential</code>)</p>
</section>
<section class="slide level6">
<p>STEP 3. Document your functions</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<p>Easy thing: brief sentence describing the function without using the name of the function*</p>
<p>*<em>this is the most important</em></p>
<aside class="notes">
<ul>
<li>second line of defense in remembering what a function does</li>
<li>The more important the function, the more it should be documented</li>
<li>if using python, use the numpy format</li>
<li>if using matlab, use the matlab format</li>
<li>documentation often longer than the code itself</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
<span class="co">'''Number of standard deviations from the mean'''</span>
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> zscore(x):
<span class="co">'''Number of standard deviations from the mean'''</span>
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> linear_regression(design_matrix, response):
<span class="co">'''Calculate a linear least-squares regression for</span>
<span class="co"> two sets of measurements'''</span>
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<ul>
<li>additional detail about what the function does or method it implements</li>
<li>description of the parameters</li>
<li>description of the outputs</li>
<li>examples if you can</li>
</ul>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Document your functions</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> linear_regression(design_matrix, response):
<span class="co">'''Calculate a linear least-squares regression for</span>
<span class="co"> two sets of measurements</span>
<span class="co"> Uses the QR decomposition to avoid numerical instability</span>
<span class="co"> in taking the inverse.</span>
<span class="co"> Parameters</span>
<span class="co"> ----------</span>
<span class="co"> design_matrix, response : array_like</span>
<span class="co"> Two sets of measurements. Both arrays should have</span>
<span class="co"> the same length.</span>
<span class="co"> Returns</span>
<span class="co"> -------</span>
<span class="co"> coefficients : array_like</span>
<span class="co"> Parameters estimated from the model.</span>
<span class="co"> Examples</span>
<span class="co"> --------</span>
<span class="co"> >>> design_matrix = np.random.random(10)</span>
<span class="co"> >>> response = np.random.random(10)</span>
<span class="co"> >>> coefficients = linear_regression(design_matrix, response)</span>
<span class="co"> '''</span>
Q, R <span class="op">=</span> np.linalg.qr(design_matrix, mode<span class="op">=</span><span class="st">'reduced'</span>)
<span class="cf">return</span> np.linalg.solve(R, np.dot(Q.T, response))</code></pre></div>
</section>
<section class="slide level6">
<p>STEP 4. Test your code</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<p>Make sure your code works like you think it does</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<p>Think about how your code can fail</p>
</section>
<section class="slide level6">
<p>Small, well-defined, well-named functions are easy to test!</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> zscore(x):
<span class="co">'''Number of standard deviations from the mean'''</span>
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> test_zscore():
<span class="cf">pass</span></code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy <span class="im">as</span> np
<span class="kw">def</span> zscore(x):
<span class="co">'''Number of standard deviations from the mean'''</span>
<span class="cf">return</span> (x <span class="op">-</span> x.mean()) <span class="op">/</span> x.std()
<span class="kw">def</span> test_zscore():
test_values <span class="op">=</span> np.asarray([<span class="dv">1</span>, <span class="dv">3</span>])
expected_values <span class="op">=</span> np.asarray([<span class="op">-</span><span class="dv">1</span>, <span class="dv">1</span>])
<span class="cf">assert</span> np.allclose(zscore(test_values), expected_values)</code></pre></div>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<p><span class="highlight">Unit tests</span> test a small component of your code (usually a small function) and makes sure it works like you think it works</p>
<aside class="notes">
<ul>
<li>Isolate small components of program and make sure they are correct</li>
<li>doesn’t ensure that combinations of these functions work (integration testing)</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="highlight">Unit tests</span> prevent regression of your code</p>
</section>
<section class="slide level6">
<p>If you change your code, you want to know what still works and what has broken (Regression)</p>
</section>
<section class="slide level6">
<p>Functions should be simple to test</p>
<aside class="notes">
<ul>
<li>if the number of test cases is uncomfortably large, start looking for smaller units to test.</li>
<li>your function is probably too complex</li>
<li>After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>If you find a bug, write a test.</p>
<aside class="notes">
After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.
</aside>
</section>
<section class="slide level6">
<p>Use unit tests to define the requirements of your code</p>
<aside class="notes">
<ul>
<li>ensure that your function is well-defined</li>
<li>some people even write unit tests before writing a function (test-driven development)</li>
<li>also a form of documentation: examples for how you think your code should work</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>You can use programs called <span class="highlight">test runners</span> to run a group of unit tests automatically.</p>
</section>
<section class="slide level6">
<p>Matlab, Python, R have unit test packages</p>
<ul>
<li><a href="https://www.mathworks.com/help/matlab/matlab-unit-test-framework.html">Matlab unit test framework</a></li>
<li><a href="https://docs.python.org/3.4/library/unittest.html">Python unit test</a></li>
<li><a href="http://doc.pytest.org/en/latest/">Pytest</a></li>
<li><a href="https://github.com/hadley/testthat">R: testthat</a></li>
</ul>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Test your code</span></p>
<p>There are also libraries available that will work with your version control system to run these tests every time you commit a new piece of code (<span class='highlight'>continuous integration<span>)</p>
<aside class="notes">
<ul>
<li>This all seems complicated but in the process of developing code, you should be writing tests to make sure it works. This process just formalizes the writing of tests and allows you to run them at a later time, ensuring peace of mind.</li>
<li>yields more predictable code</li>
<li>in order to write a test, you have to know what the function does</li>
<li>people can look at your tests to understand your code (form of documentation)</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p>STEP 5. Use version control</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Sophisticated way to track change in your code over time</p>
<aside class="notes">
<ul>
<li>dropbox is a form of this (but not very sophisticated)</li>
<li>microsoft word is also a form of this (but not very sophisticated)</li>
<li>snapshots of all the files in a folder (repository)</li>
<li>git is the most popular (some time is needed to learn this, but social/collaborative/popularity make it worth it)</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<figure>
<img src="img/github-desktop.png" alt="Github Desktop" /><figcaption>Github Desktop</figcaption>
</figure>
</section>
<section class="slide level6">
<p>Version control stores the whole history of your project</p>
</section>
<section class="slide level6">
<figure>
<img src="img/commit-history.png" />
</figure>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Helps you back up your work</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Go back to previous versions of your code</p>
</section>
<section class="slide level6">
<figure>
<img src="img/commit-history.png" />
</figure>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Reduce code clutter and confusion</p>
<aside class="notes">
<ul>
<li>no more code_v1.m, code_v2.m</li>
<li>which version of code was I using???</li>
<li>which version of code worked???</li>
<li>how is this different from other code I wrote???</li>
</ul>
</aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Experiment with different versions of code (branches)</p>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Makes it easier to work with others</p>
<aside class="notes">
<ul>
<li>standardized way of not unintentionally overwriting each others code</li>
<li>easy to share code (GitHub, Bitbucket, etc)</li>
<li>makes it easier to document issues with code or data</li>
<li>Use example from this presentation
<aside></li>
</ul>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Use version control</span></p>
<p>Commit early and often (take a lot of snapshots of your code)</p>
<aside class="notes">
<ul>
<li>when you get a piece of code working, commit it (take a snapshot)</li>
<li>Leave a short informative commit message (document what the commit is)</li>
<li>don’t comment out code, just remove it, you can get back</li>
<li>I personally use GitHub Desktop
<ul>
<li>easy to use user interface
<aside></li>
</ul></li>
</ul>
</section>
<section class="slide level6">
<p>STEP 6. Refactor your code</p>
</section>
<section class="slide level6">
<blockquote>
<p>“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.” – <cite>Martin Fowler, Refactoring: Improving the Design of Existing Code</cite></p>
</blockquote>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">Refactor your code</span></p>
<p>Always leave the code in a better state than when you first found it.</p>
<aside class="notes">
<p>Your code isn’t going to be perfect the first time</p>
<p>Just like in writing, your code will get better as you revise it.</p>
<p>You wouldn’t expect a first draft to be perfect.</p>
<p>each time you look at your code: * do my variable/function names make sense? * do I know what this function is doing? * can I turn things into functions? * can I generalize this function?</p>
<p>There is some tradeoff between tinkering with your code and getting things done</p>
Also don’t throw everything out and re-write from scratch unless you can absolutely help it * “When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes.” If tempted by this tutorial to do this to your existing codebase, don’t
<aside>
</section>
<section class="slide level6">
<p>STEP 7. Always search for well-maintained software libraries that do what you need.</p>
</section>
<section class="slide level6">
<p>Don’t rewrite functions that are already implemented as part of the core language.</p>
</section>
<section class="slide level6">
<p>Use other software libraries if they are well-maintained</p>
<aside class="notes">
<p>Why: * because more users mean less bugs * better tested</p>
Little tricky: still need to take time to vet the code to make sure it does what you think it does
<aside>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">How to write good code???</span></p>
<p>Exercise in managing complexity:</p>
<ul>
<li>break problems down into smaller components</li>
<li>eliminate unnecessary dependencies</li>
<li>keep track of what you did (be organized)</li>
</ul>
</section>
<section class="slide level6">
<p>Summary:</p>
<ol type="1">
<li>Write small well-defined, well-named functions</li>
<li>Use good function and variable names</li>
<li>Document your functions</li>
<li>Test your code</li>
<li>Refactor your code</li>
<li>Use version control</li>
<li>Always search for well-maintained software libraries that do what you need.</li>
</ol>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">break problems down into smaller components</span></p>
<ol type="1">
<li>Write small well-defined, well-named functions</li>
<li><span class="dim">Use good function and variable names</span></li>
<li><span class="dim">Document your functions</span></li>
<li><span class="dim">Test your code</span></li>
<li>Refactor your code</li>
<li><span class="dim">Use version control</span></li>
<li>Always search for well-maintained software libraries that do what you need.</li>
</ol>
</section>
<section class="slide level6">
<p><span class="deemphasized-title">keep track of what you did (be organized)</span></p>
<ol type="1">
<li><span class="dim">Write small well-defined, well-named functions</span></li>
<li>Use good function and variable names</li>
<li>Document your functions</li>
<li>Test your code</li>
<li><span class="dim">Refactor your code</span></li>
<li>Use version control</li>
<li><span class="dim">Always search for well-maintained software libraries that do what you need.</span></li>
</ol>
</section>
<section class="slide level6">
<p>Conclusion: Writing good code takes work</p>
</section>
<section class="slide level6">
<p>We have a scientific obligation to ensure the correctness of our programs.</p>
<aside class="notes">
<p>I think it is a mistake to think that only “programmers” working for companies need to bother with writing good code.</p>
<p>You are a programmer dealing with complex programs.</p>
Need to put the same amount of effort as performing the experiment or writing the paper.
</aside>
</section>
<section class="slide level6">
<p>Exercises</p>
<ul>
<li><p>Go to <a href="https://github.com/edeno/Better-Science-Code" class="uri">https://github.com/edeno/Better-Science-Code</a></p></li>
<li><p>Copy either <a href="https://raw.githubusercontent.com/edeno/Better-Science-Code/master/exercises/exercises.py">exercises.py</a> or <a href="https://raw.githubusercontent.com/edeno/Better-Science-Code/master/exercises/exercises.m">exercises.m</a></p></li>
<li><p>Work on for 30 minutes (either solo or in groups).</p></li>
<li><p>Code Review: We will discuss what people came up with</p></li>
</ul>
</section>
<section class="slide level6">
<p>Exercise Objectives</p>
</section>
<section class="slide level6">
<p>Bonus: Data Management</p>
</section>
<section class="slide level6">
<p>Put different projects in different folders/repositories</p>
</section>
<section class="slide level6">
<p>Use relative paths</p>
</section>
<section class="slide level6">
<p>Separate the data from the code</p>
</section>
<section class="slide level6">
<p>Processed Data should be separated from Raw Data to avoid accidentally changing the data</p>
</section>
<section class="slide level6">
<p>Tidy Data:</p>
<ul>
<li>Each variable forms a column.</li>
<li>Each observation forms a row.</li>
<li>Each type of observational unit forms a table</li>
<li>flat is better than nested</li>
</ul>
</section>
<section class="slide level6">
<p>If original data is not in a good form, convert it to a good form (but don’t overwrite the original data)</p>
</section>
<section class="slide level6">
<p>Don’t hand-edit data files.</p>
</section>
<section class="slide level6">
<p>All aspects of data cleaning should be in scripts</p>
</section>
<section class="slide level6">
<p>File naming:</p>
<ul>
<li>Don’t use spaces in file names</li>
<li>Use leading zeros (001 vs. 1)</li>
</ul>
</section>
</div>
</div>
<script src="revealjs/lib/js/head.min.js"></script>
<script src="revealjs/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
// Display controls in the bottom right corner
controls: false,
// Display the page number of the current slide
slideNumber: "c/t",
// Push each slide change to the browser history
history: true,
// Transition style
transition: 'none', // none/fade/slide/convex/concave/zoom
// Optional reveal.js plugins
dependencies: [
{ src: 'revealjs/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'revealjs/plugin/zoom-js/zoom.js', async: true },
{ src: 'revealjs/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>