-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path04-container.html
221 lines (216 loc) · 11.6 KB
/
04-container.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Advanced NumPy</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Advanced NumPy</h1></a>
<h2 class="subtitle">Array container</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<p>After the lesson learner:</p>
<ul>
<li>Can list some of the object types which can be contained in an array.</li>
<li>Understands the concept of <code>dtype</code> and can select <code>dtype</code> best for the data at hand.</li>
<li>Knows what is an object array and when it is created.</li>
<li>Can explain what are <code>ndim</code>, <code>shape</code> and <code>stride</code> properties of an array.</li>
<li>Understand the layout of an array in memory and knows how to use it for best array performance.</li>
<li>Can explain the difference between Fortran- and C-based order. Knows the default.</li>
</ul>
</div>
</section>
<h3 id="data-type">Data type</h3>
<p>In contrast to built-in Python containers (like lists) NumPy arrays can store elements of pre-determined type only. To see the type of array contents you can use the <code>dtype</code> attribute. Let’s look at two examples:</p>
<pre><code>>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')</code></pre>
<p>In the first case the numbers are 64-bit (8-byte) integers and in the second 64-bit floating point (real) numbers. Note that NumPy auto-detects the data-type from the input. Specialised data types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers.</p>
<p>Note that all of the elements of an array must be of the same type. If we construct an array with different elements it will be <strong>cast</strong> to the “most general” type that can represent all elements. For example, array constructed from real numbers and integers will have a floating point data type:</p>
<pre><code>>>> a = np.array([1., 2])
dtype('float64')</code></pre>
<p>In case it is impossible, NumPy will use an <code>object</code> type (also represented by capital <code>'O'</code>) which can represent any Python object – even a function:</p>
<pre><code>>>> def f(): pass
>>> a = np.array([f, f])
>>> a.dtype
dtype('O')</code></pre>
<p>Some of NumPy features (like element-wise functions, <code>np.abs</code>, <code>np.sqrt</code>, etc., or reductions, <code>np.sum</code>, <code>np.max</code> etc.) won’t work with object arrays, but all types of indexing still work.</p>
<p><code>object</code> type is most commonly encountered when constructing an array from multiple lists of different lengths:</p>
<pre><code>>>> np.array([[1], [2, 3]])
array([[1], [2, 3]], dtype=object)</code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="integer-or-real-number"><span class="glyphicon glyphicon-pencil"></span>Integer or real number?</h2>
</div>
<div class="panel-body">
<p>Construct the array <code>x = np.array([0, 1, 2, 255], dtype=np.uint8)</code> (here, <code>uint8</code> represents a single byte in memory, an unsigned integer between 0 and 255). Can you explain the results obtained by x + 1 and x / 2? Also try <code>x.astype(float) + 1</code> and <code>x.astype(float) / 2</code>.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="data-types"><span class="glyphicon glyphicon-pencil"></span>Data types</h2>
</div>
<div class="panel-body">
<p>Try to guess the data type of the following arrays. Then test your prediction by constructing the arrays and check their dtype attribute.</p>
<pre><code>a = np.array([[1, 2],
[2, 3]])
b = np.array(['a', 'b', 'c'])
c = np.array([1, 2, 'a'])
d = np.array([np.dot, np.array])
e = np.random.randn(5) > 0
f = np.arange(5)</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="complex-data-types"><span class="glyphicon glyphicon-pencil"></span>Complex data types</h2>
</div>
<div class="panel-body">
<p>Imagine you have a list of names and scores, which you want to store in numpy array. Construct a dtype such that the following works. Look at documentation of <code>np.dtype</code>.</p>
<pre><code>dtype = ?
np.array([('Bartosz', 5), ('Nelle', 4)], dtype=dtype)</code></pre>
</div>
</section>
<h3 id="memory-layout">Memory layout</h3>
<p>NumPy array is just a memory block with extra information how to interpret its contents. Since memory has only linear address space, NumPy arrays need extra information how to lay out this block into multiple dimensions. This is done by means of <code>shape</code> and <code>strides</code> attributes:</p>
<div class="figure">
<img src="fig/strides.svg" alt="Shape and strides" />
<p class="caption">Shape and strides</p>
</div>
<p>Lets try to reproduce this example. We first generate a 1D NumPy array of 8 elements:</p>
<pre><code>>>> a = np.arange(8, dtype=np.uint8)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=uint8)
>>> a.strides
(1,)
>>> a.shape
(8,)</code></pre>
<p><code>shape</code> and <code>strides</code> attributes are read-only, so we can not modify them directly. However, we my use <code>as_strided</code> function from NumPy library module:</p>
<pre><code>>>> a1 = np.lib.stride_tricks.as_strided(a, strides=(4, 1), shape=(2,4))
>>> a1
array([[0, 1, 2, 3],
[4, 5, 6, 7]], dtype=uint8)</code></pre>
<p>Similarly, we can obtain the second example:</p>
<pre><code>>>> a2 = np.lib.stride_tricks.as_strided(a, strides=(2, 1), shape=(3,4))
>>> a2
array([[0, 1, 2, 3],
[2, 3, 4, 5],
[4, 5, 6, 7]], dtype=uint8)</code></pre>
<p>Note that in the second case the same data appears twice. However, it does not consume extra memory – all three arrays share the same memory block:</p>
<pre><code>>>> a[2] = 100
>>> a1
array([[ 0, 1, 100, 3],
[ 4, 5, 6, 7]], dtype=uint8)
>>> a2
array([[ 0, 1, 100, 3],
[100, 3, 4, 5],
[ 4, 5, 6, 7]], dtype=uint8)</code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="transpose"><span class="glyphicon glyphicon-pencil"></span>Transpose</h2>
</div>
<div class="panel-body">
<p>Create 3x4 random array. Have a look at its different properties: <code>x.shape</code>, <code>x.ndim</code>, <code>x.dtype</code>, <code>x.strides</code>. What does each property tell you about the array?</p>
<p>Compare the strides property of x.T to the above. What is x.T and can you explain the new strides?</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="fastest-changing-index"><span class="glyphicon glyphicon-pencil"></span>Fastest changing index</h2>
</div>
<div class="panel-body">
<p>Compare the time of summing over rows and columns of an array <code>A = np.random.rand(10, 100000)</code>. How would you explain the differences? (<em>Hint</em>: To measure evaluation time you can use <code>%timeit</code> of ipython)</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="sliding-window"><span class="glyphicon glyphicon-pencil"></span>Sliding window</h2>
</div>
<div class="panel-body">
<p>Use <code>as_strided</code> to produce a sliding-window view of a 1D array.</p>
<pre><code>def sliding_window(arr, size=2):
"""Produce an array of sliding window views of `arr`
Parameters
----------
arr : 1D array, shape (N,)
The input array.
size : int, optional
The size of the sliding window.
Returns
-------
arr_slide : 2D array, shape (N - size - 1, size)
The sliding windows of size `size` of `arr`.
Examples
--------
>>> a = np.array([0, 1, 2, 3])
>>> sliding_window(a, 2)
array([[0, 1],
[1, 2],
[2, 3]])
"""
return arr # fix this</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="fortran-or-c-ordering"><span class="glyphicon glyphicon-pencil"></span>Fortran or C-ordering?</h2>
</div>
<div class="panel-body">
<p>The <code>order</code> keyword of some <code>numpy</code> functions determines how two- or more dimensional arrays are laid out in the memory. If order is ‘C’, then the array will be in C-contiguous order (last-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (first-index varies the fastest). In what order will be the 2D array stored in memory? (<em>Hint</em>: You can use <code>np.ravel(x, order='A')</code> to test it).</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="broadcasting-revisited"><span class="glyphicon glyphicon-pencil"></span>Broadcasting revisited</h2>
</div>
<div class="panel-body">
<p>Explain how broadcasting works internally using the example below. What will be shapes and strides of <code>x</code> and <code>y</code> after broadcasting. Test it using <code>np.broadcast_arrays</code> in the following example and look at <code>strides</code> and <code>shape</code> properties of both arrays.</p>
<pre><code>x = np.random.rand(5, 10)
y = np.random.rand(10)
z = x + y
xb, yb = np.broadcast_arrays(x, y)</code></pre>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/paris-swc/advanced-numpy-lesson">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
<script src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
</body>
</html>