-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathcallbacks.tex
682 lines (541 loc) · 20.2 KB
/
callbacks.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
\chapter{Callbacks}\label{s:callbacks}
JavaScript relies heavily on \grefdex{g:callback-function}{callback functions}{callback function}:
Instead of a function giving us a result immediately,
we give it another function that tells it what to do next.
Many other languages use them as well,
but JavaScript is often the first place that programmers with data science backgrounds encounter them.
In order to understand how they work and how to use them,
we must first understand what actually happens when functions are defined and called.
\section{The Call Stack}\label{s:callbacks-callstack}
When JavaScript \grefdex{g:parse}{parses}{parsing} the expression \texttt{let\ name\ =\ "text"},
it allocates a block of memory big enough for four characters
and stores a reference to that block of characters in the variable \texttt{name}.
We can show this by drawing a \gref{g:memory-diagram}{memory diagram}
like the one in \figref{f:callbacks-name-value}.
\figpdf{figures/callbacks-name-value.pdf}{Name and Value}{f:callbacks-name-value}
When we write:
\begin{minted}{js}
oneMore = (x) => {
return x + 1
}
\end{minted}
\noindent
JavaScript allocates a block of memory big enough to store several instructions,
translates the text of the function into instructions,
and stores a reference to those instructions in the variable \texttt{oneMore}
(\figref{f:callbacks-one-more}).
\figpdf{figures/callbacks-one-more.pdf}{Functions in Memory}{f:callbacks-one-more}
The only difference between these two cases is what's on the other end of the reference:
four characters or a bunch of instructions that add one to a number.
This means that we can assign the function to another variable,
just as we would assign a number:
\begin{minted}{js}
// assign the function oneMore to the variable anotherName
const anotherName = oneMore
// instead of calling the function oneMore we can call the function anotherName
console.log(anotherName(5))
\end{minted}
\begin{minted}{text}
6
\end{minted}
Doing this does \emph{not} call the function:
as \figref{f:callbacks-alias-function} shows,
it creates a second name,
or \gref{g:alias}{alias},
that refers to the same block of instructions.
\figpdf{figures/callbacks-alias-function.pdf}{Aliasing a Function}{f:callbacks-alias-function}
As explained in \chapref{s:basics},
when JavaScript calls a function it assigns the arguments in the call to the function's parameters.
In order for this to be safe,
we need to ensure that there are no \grefdex{g:name-collision}{name collisions}{name collision},
i.e.,
that if there is a variable called \texttt{something} and one of the function's parameters is also called \texttt{something},
the function will use the right one.
The way every modern language implements this is to use a \gref{g:call-stack}{call stack}.
Instead of putting all our variables in one big table,
we have one table for global variables
and one extra table for each function call.
This means that if we assign 100 to \texttt{x},
call \texttt{oneMore(2\ *\ x\ +\ 1)},
and look at memory in the middle of that call,
we will see what's in \figref{f:callbacks-call-stack}.
\figpdf{figures/callbacks-call-stack.pdf}{The Call Stack}{f:callbacks-call-stack}
\section{Functions of Functions}\label{s:callbacks-func}
The call stack allows us to write and call functions
without worrying about whether we're accidentally going to refer to the wrong variable.
And since functions are just another kind of data,
we can pass one function into another.
For example,
we can write a function called \texttt{doTwice} that calls some other function two times:
\begin{minted}{js}
const doTwice = (action) => {
action()
action()
}
const hello = () => {
console.log('hello')
}
doTwice(hello)
\end{minted}
\begin{minted}{text}
hello
hello
\end{minted}
\noindent
Again,
this is clearer when we look at the state of memory while \texttt{doTwice} is running
(\figref{f:callbacks-do-twice}).
\figpdf{figures/callbacks-do-twice.pdf}{Functions of Functions}{f:callbacks-do-twice}
This becomes more useful when the function or functions passed in have parameters of their own.
For example,
the function \texttt{pipeline} passes a value to one function,
then takes that function's result and passes it to a second,
and returns the final result:
\begin{minted}{js}
const pipeline = (initial, first, second) => {
return second(first(initial))
}
\end{minted}
Let's use this to combine
a function that trims blanks off the starts and ends of strings
and another function that uses a \gref{g:regular-expression}{regular expression} (\appref{s:regexp})
to replace spaces with dots:
\begin{minted}{js}
const trim = (text) => { return text.trim() }
const dot = (text) => { return text.replace(/ /g, '.') }
const original = ' this example uses text '
const trimThenDot = pipeline(original, trim, dot)
console.log(`trim then dot: |${trimThenDot}|`)
\end{minted}
\begin{minted}{text}
trim then dot: |this.example.uses.text|
\end{minted}
During the call to \texttt{temp\ =\ first(initial)},
but before a value has been returned to be assigned to \texttt{temp},
memory looks like \figref{f:callbacks-pipeline}.
\figpdf{figures/callbacks-pipeline.pdf}{Implementing a Pipeline}{f:callbacks-pipeline}
\noindent
Reversing the order of the functions changes the result:
\begin{minted}{js}
const dotThenTrim = pipeline(original, dot, trim)
console.log(`dot then trim: |${dotThenTrim}|`)
\end{minted}
\begin{minted}{text}
dot then trim: |..this.example.uses.text..|
\end{minted}
We can make a more general pipeline by passing an array of functions:
\begin{minted}{js}
const pipeline = (initial, operations) => {
let current = initial
for (let op of operations) {
current = op(current)
}
return current
}
\end{minted}
\noindent
Going through this line by line:
\begin{itemize}
\item
The function \texttt{pipeline} takes an initial input value and array of functions.
Each of those functions must take one value as an input and produce one value as output.
\item
We initialize a variable called \texttt{current} to hold the current value.
We have to use \texttt{let} for this rather than \texttt{const}
because we want to update it after each step of the pipeline runs.
\item
We then call each of the functions in the pipeline in turn,
passing in the current value and storing the result to be passed into the next function.
\item
The final result is whatever came out of the last step in the pipeline.
\end{itemize}
We don't actually have to create a separate variable for the current value;
since parameters are always variables rather than constants,
we could just overwrite the parameter \texttt{initial} over and over again like this:
\begin{minted}{js}
const pipeline = (current, operations) => {
for (let op of operations) {
current = op(current)
}
return current
}
\end{minted}
Let's add a function \texttt{double} to our suite of text manglers:
\begin{minted}{js}
const double = (text) => { return text + text }
\end{minted}
\noindent
and then try it out:
\begin{minted}{js}
const original = ' some text '
const final = pipeline(original, [double, trim, dot])
console.log(`|${original}| -> |${final}|`)
\end{minted}
\begin{minted}{text}
| some text | -> |some.text..some.text|
\end{minted}
\noindent
The order of operations is:
\begin{enumerate}
\item
\texttt{current} is assigned \texttt{' some text '} (with spaces around each word).
\item
It is then assigned \texttt{double(' some text ')}, or \texttt{' some text some text '}.
\item
That value is then passed to \texttt{trim},
so \texttt{'some text some text'} (without leading or trailing spaces)
is assigned to \texttt{current}.
\item
Finally, the current value is passed to \texttt{dot}
and the result \texttt{some.text..some.text.} is returned.
\end{enumerate}
\section{Anonymous Functions}\label{s:callbacks-anonymous}
Remember the function \texttt{oneMore}?
We can pass it a value that we have calculated on the fly:
\begin{minted}{js}
oneMore = (x) => {
return x + 1
}
console.log(oneMore(3 * 2))
\end{minted}
\begin{minted}{text}
7
\end{minted}
Behind the scenes,
JavaScript allocates a nameless temporary variable to hold the value of \texttt{3\ *\ 2},
then passes a reference to that temporary variable into \texttt{oneMore}.
We can do the same thing with functions,
i.e., create one on the fly without giving it a name as we're passing it into some other function.
For example,
suppose that instead of pushing one value through a pipeline of functions,
we want to call a function once for each value in an array:
\begin{minted}{js}
const transform = (values, operation) => {
let result = []
for (let v of values) {
result.push(operation(v))
}
return result
}
const data = ['one', 'two', 'three']
const upper = transform(data, (x) => { return x.toUpperCase() })
console.log(`upper: ${upper}`)
\end{minted}
\begin{minted}{text}
upper: ONE,TWO,THREE
\end{minted}
Taking the first letter of a word is so simple that it's hardly worth giving the function a name,
so let's move the definition into the call to \texttt{transform}:
\begin{minted}{js}
const first = transform(data, (x) => { return x[0] })
console.log(`first: ${first}`)
\end{minted}
\begin{minted}{text}
first: o,t,t
\end{minted}
\noindent
A function that is defined where it is used and isn't assigned a name
is called an \gref{g:anonymous-function}{anonymous function}.\index{function!anonymous}
Most callback functions in JavaScript are written this way:
the function with the given parameters and body (in this case, \texttt{x} and \texttt{return x[0]})
is passed directly to something else (in this case, \texttt{transform}) to be called.
\section{Functional Programming}\label{s:callbacks-functional}
\grefdex{g:functional-programming}{Functional programming}{functional programming}\index{programming!functional}
is a style of programming
that relies heavily on \grefdex{g:higher-order-function}{higher-order functions}{higher-order function}\index{function!higher-order}
like \texttt{pipeline}
that take other functions as parameters.
In addition,
functional programming expects that functions won't modify data in place,
but will instead create new data from old.
For example,
a true believer in functional programming would be saddened by this:
\begin{minted}{js}
// Create test array
const test = [1,2,3]
const impure = (values) => {
for (let i in values) {
values[i] += 1
}
}
// Run function
impure(test)
// Original array has been modified
console.log(`test: ${test}`)
\end{minted}
\begin{minted}{text}
test: 2,3,4
\end{minted}
\noindent
and would politely but firmly suggest that it be rewritten like this:
\begin{minted}{js}
const test = [1,2,3]
const pure = (values) => {
result = []
for (let v of values) {
result.push(v + 1)
}
return result
}
// Rather than modify test, we create a new array for the results
const newArray = pure(test)
console.log(`newArray: ${newArray}$`)
console.log(`test: ${test}`)
\end{minted}
\begin{minted}{text}
newArray: 2,3,4
test: 1,2,3
\end{minted}
JavaScript arrays provide several methods to support functional programming.
For example,\index{array methods!\texttt{some}}
\texttt{Array.some} returns \texttt{true} if \emph{any} element in an array passes a test,
while \texttt{Array.every} returns \texttt{true}\index{array methods!\texttt{every}}
if \emph{all} elements in an array pass a test.
Here's how they work:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('some longer than 3:',
data.some((x) => { return x.length > 3 }))
console.log('all longer than 3:',
data.every((x) => { return x.length > 3 }))
\end{minted}
\begin{minted}{text}
some longer than 3: true
all longer than 3: false
\end{minted}
\texttt{Array.filter}\index{array methods!\texttt{filter}}
creates a new array containing only values that pass a test:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('those longer than 3:',
data.filter((x) => { return x.length > 3 }))
\end{minted}
\begin{minted}{text}
those longer than 3: [ 'this', 'test' ]
\end{minted}
\noindent
So do all of the elements with more than 3 characters start with a `t'?
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
const result = data
.filter((x) => { return x.length > 3 })
.every((x) => { return x[0] === 't' })
console.log(`all longer than 3 start with t: ${result}`)
\end{minted}
\begin{minted}{text}
all longer than 3 start with t: true
\end{minted}
\texttt{Array.map}\index{array methods!\texttt{map}}
creates a new array by calling a function for each element of an existing array:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('shortened', data.map((x) => { return x.slice(0, 2) }))
\end{minted}
\begin{minted}{text}
shortened [ 'th', 'is', 'a', 'te' ]
\end{minted}
Finally,
\texttt{Array.reduce} reduces an array to a single value\index{array methods!\texttt{reduce}}
using a combining function and a starting value.
The combining function must take two values,
which are the current running total and the next value from the array;
if the array is empty,
\texttt{Array.reduce} returns the starting value.
An example will make this clearer.
To start,
let's create an acronym using a loop:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
let acronym = ''
for (let value of data) {
acronym = acronym + value[0]
}
console.log(`acronym of ${data} is ${acronym}`)
\end{minted}
\begin{minted}{text}
acronym of this,is,a,test is tiat
\end{minted}
The three key elements in the short program above are the input data,
the initial value of the variable \texttt{acronym},
and the way that variable is updated.
When \texttt{Array.reduce} is used,
the array is the data and the initial value and update function
are passed as parameters:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
const concatFirst = (accumulator, nextValue) => {
return accumulator + nextValue[0]
}
let acronym = data.reduce(concatFirst, '')
console.log(`acronym of ${data} is ${acronym}`)
\end{minted}
\begin{minted}{text}
acronym of this,is,a,test is tiat
\end{minted}
As elsewhere,
we can define the function where we use it:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
acronym = data.reduce((accum, next) => {
return accum + next[0]
}, '')
console.log('all in one step:', acronym)
\end{minted}
\begin{minted}{text}
all in one step: tiat
\end{minted}
The indentation of the anonymous function defined inside \texttt{reduce} may look a little odd,
but this is the style the JavaScript community has settled on.
\section{Closures}\label{s:callbacks-closures}
The last tool we need to introduce is an extremely useful side-effect of the way memory is handled.
The easiest way to explain it is by example.
We have already defined a function called \texttt{pipeline} that chains any number of other functions together:
\begin{minted}{js}
const pipeline = (initial, operations) => {
let current = initial
for (let op of operations) {
current = op(current)
}
return current
}
\end{minted}
However,
\texttt{pipeline} only works if each function in the array \texttt{operations} has a single parameter.
If we want to be able to add 1,
add 2,
and so on,
we have to write separate functions,
which is annoying.
A better option is to write a function that creates the function we want:
\begin{minted}{js}
const adder = (increment) => {
const f = (value) => {
return value + increment
}
return f
}
const add_1 = adder(1)
const add_2 = adder(2)
console.log(`add_1(100) is ${add_1(100)}, add_2(100) is ${add_2(100)}`)
\end{minted}
\begin{minted}{text}
add_1(100) is 101, add_2(100) is 102
\end{minted}
The best way to understand what's going on is to draw a step-by-step memory diagram.
In step 1, we call \texttt{adder(1)}
(\figref{f:callbacks-adder-1}).
\texttt{adder} creates a new function that includes a reference to that 1 we just passed in
(\figref{f:callbacks-adder-2}).
In step 3,
\texttt{adder} returns that function, which is assigned to \texttt{add\_1}
(\figref{f:callbacks-adder-3}).
Crucially,
the function that \texttt{add\_1} refers to still has a reference to the value 1,
even though that value isn't referred to any longer by anyone else.
\figpdf{figures/callbacks-adder-1.pdf}{Creating an Adder (Step 1)}{f:callbacks-adder-1}
\figpdf{figures/callbacks-adder-2.pdf}{Creating an Adder (Step 2)}{f:callbacks-adder-2}
\figpdf{figures/callbacks-adder-3.pdf}{Creating an Adder (Step 3)}{f:callbacks-adder-3}
In steps 4--6,
we repeat these three steps to create another function that has a reference to the value 2,
and assign that function to \texttt{add\_2}
(\figref{f:callbacks-adder-4}).
\figpdf{figures/callbacks-adder-4.pdf}{Creating an Adder (Steps 4-6)}{f:callbacks-adder-4}
\noindent
When we now call \texttt{add\_1} or \texttt{add\_2},
they add the value passed in and the value they've kept a reference to.
This trick of capturing a reference to a value inside something else
is called a \gref{g:closure}{closure}.
It works because JavaScript holds on to values as long as anything,
anywhere,
still refers to them.
Closures solve our pipeline problem by letting us define little functions
on the fly
and give them extra data to work with:
\begin{minted}{js}
const result = pipeline(100, [adder(1), adder(2)])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}
\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}
\noindent
Again, \texttt{adder(1)} and \texttt{adder(2)} do not add anything to anything:
they define new (unnamed) functions that add 1 and 2 respectively when called.
Programmers often go one step further and define little functions like this inline:
\begin{minted}{js}
const result = pipeline(100, [(x) => x + 1, (x) => x + 2])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}
\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}
As this example shows,
if the body of a function is a single expression,
it doesn't have to be enclosed in \texttt{\{...\}} and \texttt{return} doesn't need to be used.
\section{Exercises}\label{s:callbacks-exercises}
\exercise{Side Effects With \texttt{forEach}}
JavaScript arrays have a method called \texttt{forEach},\index{array methods!\texttt{forEach}}
which calls a callback function once for each element of the array.
Unlike \texttt{map},
\texttt{forEach} does \emph{not} save the values returned by these calls
or return an array of results.
The full syntax is:
\begin{minted}{js}
someArray.forEach((value, location, container) => {
// 'value' is the value in 'someArray'
// 'location' is the index of that value
// 'container' is the containing array (in this case, 'someArray')
})
\end{minted}
If you only need the value,
you can provide a callback that only takes one parameter;
if you only need the value and its location,
you can provide a callback that takes two.
Use this to write a function \texttt{doubleInPlace}
that doubles all the values in an array in place:
\begin{minted}{js}
const vals = [1, 2, 3]
doubleInPlace(vals)
console.log(`vals after change: ${vals}`)
\end{minted}
\begin{minted}{text}
vals after change: 2,4,6
\end{minted}
\exercise{Annotating Data}
Given an array of objects representing observations of wild animals:
\begin{minted}{js}
data = [
{'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
{'date': '1977-7-16', 'sex': 'F', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'PF'},
{'date': '1977-7-16', 'sex': 'F', 'species': 'PE'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'}
]
\end{minted}
\noindent
write a function that returns a new array of objects like this:
\begin{minted}{js}
newData = [
{'seq': 3, 'year': '1977', 'sex': 'F', 'species': 'DM'},
{'seq': 7, 'year': '1977', 'sex': 'F', 'species': 'PE'}
]
\end{minted}
\noindent
\emph{without} using any loops.
The changes are:
\begin{itemize}
\item
The \texttt{date} field is replaced with just the year.
\item
Only observations of female animals are retained.
\item
The retained records are given sequence numbers to relate them back to the original data.
(These sequence numbers are 1-based rather than 0-based.)
\end{itemize}
\noindent
You will probably want to use \texttt{Array.reduce} to generate the sequence numbers.
\section*{Key Points}
\input{keypoints/callbacks}