-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathtut_io.html
450 lines (419 loc) · 41.9 KB
/
tut_io.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Variable input and output — PyGeode 1.4.1-rc2 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/pygtheme.css" />
<link rel="stylesheet" type="text/css" href="_static/plot_directive.css" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery.css" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-binder.css" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-dataframe.css" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-rendered-html.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="shortcut icon" href="_static/pygeode_icon.ico"/>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Plotting" href="tut_plot.html" />
<link rel="prev" title="Basic Operations" href="tut_basics.html" />
<link href="http://fonts.googleapis.com/css?family=Ubuntu:300,300italic,regular,italic,500,500italic,bold,bolditalic" rel="stylesheet" type="text/css">
<link href='http://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700' rel='stylesheet' type='text/css'>
</head><body>
<div class="header" role="banner"><img class="logo" src="_static/pygeode_logo.png" width=79px alt="Logo"/>
<h1 class="heading"><a href="index.html">
<span>PyGeode 1.4.1-rc2 documentation</span></a></h1>
<h2 class="heading"><span>Variable input and output</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
<p>
«  <a href="tut_basics.html">Basic Operations</a>
  ::  
<a class="uplink" href="reference.html">Reference</a>
  ::  
<a class="uplink" href="tutorial.html">Tutorial</a>
  ::  
<a class="uplink" href="gallery/index.html">Gallery</a>
  ::  
<a href="tut_plot.html">Plotting</a>  »
</p>
</div>
<div class="content">
<div class="section" id="variable-input-and-output">
<h1>Variable input and output<a class="headerlink" href="#variable-input-and-output" title="Permalink to this headline">¶</a></h1>
<div class="section" id="reading-from-a-single-file">
<h2>Reading from a single file<a class="headerlink" href="#reading-from-a-single-file" title="Permalink to this headline">¶</a></h2>
<p>PyGeode was intended for dealing with large gridded datasets - nearly always
such datasets will be serialized on disk, sometimes in a single file, but often
spread over many files. PyGeode supports a number of different <a class="reference internal" href="fileio.html#formats"><span class="std std-ref">formats</span></a>, and the commands for loading and saving data to disk are to a large
extent independent of the format used, and attempts are made to automatically
detect which format you are working with. This being said, NetCDF files are the
most completely supported. We’ll start by looking at reading a single file.</p>
<div class="section" id="the-basics">
<h3>The basics<a class="headerlink" href="#the-basics" title="Permalink to this headline">¶</a></h3>
<p>The most basic form of reading a single file from disk is to simply call
<a class="reference internal" href="fileio.html#pygeode.open" title="pygeode.open"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open()</span></code></a>. We’ll open the file written out at the end of the <a class="reference internal" href="tut_gettingstarted.html#tutsavefile"><span class="std std-ref">Getting
Started</span></a> section of the tutorial:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pygeode</span> <span class="k">as</span> <span class="nn">pyg</span><span class="o">,</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">In [2]: </span><span class="n">ds</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'sample_data/file.nc'</span><span class="p">)</span>
<span class="gp">In [3]: </span><span class="nb">print</span><span class="p">(</span><span class="n">ds</span><span class="p">)</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (lat,lon) (31,60)</span>
<span class="go">Axes:</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go"> lon <Lon> : 0 E to 354 E (60 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>As you can see, this returns a <a class="reference internal" href="dataset.html#pygeode.Dataset" title="pygeode.Dataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">Dataset</span></code></a> object with the contents of the
file. The format of the file has been automatically detected. In this case the
netcdf file was generated by an eariler call to <a class="reference internal" href="fileio.html#pygeode.save" title="pygeode.save"><code class="xref py py-meth docutils literal notranslate"><span class="pre">save()</span></code></a>, which fills in
metadata recognized by PyGeode that indicate, for instance, what kind of
PyGeode axis each NetCDF dimension coresponds to. Let’s take a look at the
contents of this particular NetCDF file to get a quick sense of what metadata
PyGeode recognizes and how it affects the dataset returned.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ncdump -h file.nc
netcdf file {
dimensions:
lat = 31 ;
lon = 60 ;
variables:
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
lat:ancillary_variables = "lat_weights" ;
lat:axis = "Y" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
lon:axis = "X" ;
double Temp(lat, lon) ;
double lat_weights(lat) ;
// global attributes:
:history = "Synthetic Temperature data generated by pygeode" ;
}
</pre></div>
</div>
<p>We can see there is a set of metadata (NetCDF attributes) which PyGeode has
interpreted, recognizing, for instance, that the NetCDF variables <cite>lat</cite> and
<cite>lon</cite> represent coordinate axes, and should be treated as PyGeode <a class="reference internal" href="horizontalaxes.html#pygeode.Lat" title="pygeode.Lat"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lat</span></code></a>
and <a class="reference internal" href="horizontalaxes.html#pygeode.Lon" title="pygeode.Lon"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lon</span></code></a> axes¸ respectively. Because these are coordinate axes, they
are treated as (possibly child classes of) <a class="reference internal" href="axes.html#pygeode.Axis" title="pygeode.Axis"><code class="xref py py-class docutils literal notranslate"><span class="pre">Axis</span></code></a> objects, and not the
more generic <a class="reference internal" href="var.html#pygeode.Var" title="pygeode.Var"><code class="xref py py-class docutils literal notranslate"><span class="pre">Var</span></code></a> objects. The NetCDF variable <cite>Temp</cite> becomes a PyGeode
variable, defined on the axes <cite>lat</cite> and <cite>lon</cite>.</p>
<p>There is one final NetCDF variable, <cite>lat_weights</cite>. This is referenced in the
metadata of the <cite>lat</cite> coordinate variable, as an ancillary variable defining a
set of weights for the latitude axis. PyGeode recognizes this and loads it into
the <cite>weights</cite> member of the <a class="reference internal" href="horizontalaxes.html#pygeode.Lat" title="pygeode.Lat"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lat</span></code></a> axis, which is recognized when, for
instance, taking averages (<code class="xref py py-meth docutils literal notranslate"><span class="pre">mean()</span></code>) over the latitude axis.</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [4]: </span><span class="nb">print</span><span class="p">(</span><span class="n">ds</span><span class="o">.</span><span class="n">lat</span><span class="o">.</span><span class="n">weights</span><span class="p">)</span>
<span class="go">[6.12323400e-17 1.04528463e-01 2.07911691e-01 3.09016994e-01</span>
<span class="go"> 4.06736643e-01 5.00000000e-01 5.87785252e-01 6.69130606e-01</span>
<span class="go"> 7.43144825e-01 8.09016994e-01 8.66025404e-01 9.13545458e-01</span>
<span class="go"> 9.51056516e-01 9.78147601e-01 9.94521895e-01 1.00000000e+00</span>
<span class="go"> 9.94521895e-01 9.78147601e-01 9.51056516e-01 9.13545458e-01</span>
<span class="go"> 8.66025404e-01 8.09016994e-01 7.43144825e-01 6.69130606e-01</span>
<span class="go"> 5.87785252e-01 5.00000000e-01 4.06736643e-01 3.09016994e-01</span>
<span class="go"> 2.07911691e-01 1.04528463e-01 6.12323400e-17]</span>
</pre></div>
</div>
</div>
<div class="section" id="overriding-metadata">
<h3>Overriding metadata<a class="headerlink" href="#overriding-metadata" title="Permalink to this headline">¶</a></h3>
<p>All of this is convenient when one has a well-formed NetCDF file, however, one often
encounters rather bare NetCDF files in the wild, with either no or (even worse) incorrect
metadata. In this case PyGeode will not be able to recognize what kind of
<a class="reference internal" href="axes.html#pygeode.Axis" title="pygeode.Axis"><code class="xref py py-class docutils literal notranslate"><span class="pre">Axis</span></code></a> it should associate with each each axis:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [5]: </span><span class="kn">from</span> <span class="nn">pygeode.tutorial</span> <span class="kn">import</span> <span class="n">t1</span>
<span class="gp">In [6]: </span><span class="n">pyg</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s1">'sample_data/file_nometa.nc'</span><span class="p">,</span> <span class="n">t1</span><span class="p">,</span> <span class="n">cfmeta</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="gp">In [7]: </span><span class="n">ds2</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'sample_data/file_nometa.nc'</span><span class="p">)</span>
<span class="gp">In [8]: </span><span class="nb">print</span><span class="p">(</span><span class="n">ds2</span><span class="p">)</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (lat,lon) (31,60)</span>
<span class="go">Axes:</span>
<span class="go"> lat <NamedAxis 'lat'>: -90 to 90 (31 values)</span>
<span class="go"> lon <NamedAxis 'lon'>: 0 to 354 (60 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
<span class="gp">In [9]: </span><span class="n">pyg</span><span class="o">.</span><span class="n">showvar</span><span class="p">(</span><span class="n">ds2</span><span class="o">.</span><span class="n">Temp</span><span class="p">)</span>
<span class="gh">Out[9]: </span><span class="go"><pygeode.plot.wrappers.AxesWrapper at 0x7f37340173a0></span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/t1Temp_nometadata.png"><img alt="_images/t1Temp_nometadata.png" src="_images/t1Temp_nometadata.png" style="width: 4in;" /></a>
<p>The <code class="docutils literal notranslate"><span class="pre">lat</span></code> and <code class="docutils literal notranslate"><span class="pre">lon</span></code> axes are now the default type <code class="xref py py-class docutils literal notranslate"><span class="pre">NamedAxes</span></code>. While
this is still a perfectly useable PyGeode dataset, PyGeode no longer knows how
to properly annotate and format plots.</p>
<p>However, if you know what the contents of the file represent, you can provide
this information to PyGeode when opening it by passing special keyword
arguments to <a class="reference internal" href="fileio.html#pygeode.open" title="pygeode.open"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open()</span></code></a>. The most important of this is the <code class="docutils literal notranslate"><span class="pre">dimtypes</span></code>
argument, which takes a dictionary whose keys are the names of the axes in the
file being opened, and whose values tell PyGeode what kind of Axis class to
associate with that dimension. This is perhaps best illustrated by example:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="n">dt</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">lat</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">Lat</span><span class="p">,</span> <span class="n">lon</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">regularlon</span><span class="p">(</span><span class="mi">60</span><span class="p">))</span>
<span class="gp">In [11]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'sample_data/file_nometa.nc'</span><span class="p">,</span> <span class="n">dimtypes</span><span class="o">=</span><span class="n">dt</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (lat,lon) (31,60)</span>
<span class="go">Axes:</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go"> lon <Lon> : 0 E to 354 E (60 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>We can see that now the <a class="reference internal" href="namedaxis.html#pygeode.NamedAxis" title="pygeode.NamedAxis"><code class="xref py py-class docutils literal notranslate"><span class="pre">NamedAxis</span></code></a> objects have been replaced by the
appropriate axis types, <a class="reference internal" href="horizontalaxes.html#pygeode.Lat" title="pygeode.Lat"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lat</span></code></a> and <a class="reference internal" href="horizontalaxes.html#pygeode.Lon" title="pygeode.Lon"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lon</span></code></a>, respectively. You will
notice we have used two different ways of specifying the axes types. In the case
of <code class="docutils literal notranslate"><span class="pre">lat</span></code>, we have simply passed the PyGeode class itself. In this case,
PyGeode reads the values for the <code class="docutils literal notranslate"><span class="pre">lat</span></code> dimension from the file and uses these
to create a new instance of <a class="reference internal" href="horizontalaxes.html#pygeode.Lat" title="pygeode.Lat"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lat</span></code></a>.</p>
<p>In contrast, for the <code class="docutils literal notranslate"><span class="pre">lon</span></code> axis, we have created an instance of the
<a class="reference internal" href="horizontalaxes.html#pygeode.Lon" title="pygeode.Lon"><code class="xref py py-class docutils literal notranslate"><span class="pre">Lon</span></code></a> class (with a helper function <a class="reference internal" href="horizontalaxes.html#pygeode.regularlon" title="pygeode.regularlon"><code class="xref py py-meth docutils literal notranslate"><span class="pre">regularlon()</span></code></a>) prior to opening
the file. In this case, PyGeode simply uses the already-created axis, and avoids
reading in the coordinates for the axis being overridden. Of course, the length
of the axis must mach the length of the dimension in the file. This is useful
for when the data in the file is not correct, but it can also happen that in
very large files, depending on the details of how the file data has been
structured, even just reading the coordinate metadata can be quite a slow
operation. If you regularly work with such a file, passing in an already created
axis instance can speed up call to open the file considerably.</p>
<p>There is one more form that can be used in the <code class="docutils literal notranslate"><span class="pre">dimtypes</span></code> dictionary, for
when one wants to construct an axis using data from the file, but when the axis
type in question requires additional arguments to be properly constructed. To
give a contrived example, consider the following code (which is unfortunately a
bit opaque):</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [12]: </span><span class="n">kwargs</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="s1">'days'</span><span class="p">,</span> <span class="n">startdate</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">year</span><span class="o">=</span><span class="mi">2001</span><span class="p">,</span> <span class="n">month</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">day</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span>
<span class="gp">In [13]: </span><span class="n">dt2</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">lon</span> <span class="o">=</span> <span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">StandardTime</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">))</span>
<span class="gp">In [14]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'sample_data/file_nometa.nc'</span><span class="p">,</span> <span class="n">dimtypes</span><span class="o">=</span><span class="n">dt2</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (lat,time) (31,60)</span>
<span class="go">Axes:</span>
<span class="go"> lat <NamedAxis 'lat'>: -90 to 90 (31 values)</span>
<span class="go"> time <StandardTime>: Jan 1, 2001 00:00:00 to Dec 21, 2001 00:00:00 (60 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>Here we have decided to treat the <code class="docutils literal notranslate"><span class="pre">lon</span></code> axis describing a time axis, but for
some reason we wish to use the coordinate data as actual time information. As
will be explained in more detail in a later section (<a class="reference internal" href="tut_axes.html"><span class="doc">Working with Axes</span></a>),
time axes generally need a bit more information when they are constructed to
define the calendar. We need give these additional arguments (the <code class="docutils literal notranslate"><span class="pre">units</span></code> and
<code class="docutils literal notranslate"><span class="pre">startdate</span></code> items in the <code class="docutils literal notranslate"><span class="pre">kwargs</span></code> dictionary) which we pass in along with
the class object itself. PyGeode then calls the constructor of the
<a class="reference internal" href="timeaxes.html#pygeode.StandardTime" title="pygeode.StandardTime"><code class="xref py py-class docutils literal notranslate"><span class="pre">StandardTime</span></code></a> with the values read from the file and with the additional
keyword arguments to create a new axis object, which is then assigned to the
<code class="docutils literal notranslate"><span class="pre">lon</span></code> dimension of any variable.</p>
<p>There are a few other arguments recognized by <a class="reference internal" href="fileio.html#pygeode.open" title="pygeode.open"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open()</span></code></a>, but we’ll just
mention one more, perhaps the second most useful: <code class="docutils literal notranslate"><span class="pre">namemap</span></code>. This allows you
to simply rename variables and axes:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [15]: </span><span class="n">nm</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">lon</span> <span class="o">=</span> <span class="s1">'Longitude'</span><span class="p">,</span> <span class="n">Temp</span><span class="o">=</span><span class="s1">'Temperature'</span><span class="p">)</span>
<span class="gp">In [16]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">'sample_data/file_nometa.nc'</span><span class="p">,</span> <span class="n">namemap</span><span class="o">=</span><span class="n">nm</span><span class="p">,</span> <span class="n">dimtypes</span><span class="o">=</span><span class="n">dt</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temperature (lat,Longitude) (31,60)</span>
<span class="go">Axes:</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go"> Longitude <Lon>: 0 E to 354 E (60 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>As you can see, the original axes names as defined in the file are still used to
replace axis types in the <code class="docutils literal notranslate"><span class="pre">dimtypes</span></code> dictionary; the axis is renamed after it
has been replaced by the newly specified axis class.</p>
</div>
</div>
<div class="section" id="reading-from-multiple-files">
<h2>Reading from multiple files<a class="headerlink" href="#reading-from-multiple-files" title="Permalink to this headline">¶</a></h2>
<p>PyGeode is intended for use with very large datasets, which are often
distributed across many files. It has two additional methods, <a class="reference internal" href="fileio.html#pygeode.openall" title="pygeode.openall"><code class="xref py py-meth docutils literal notranslate"><span class="pre">openall()</span></code></a>
and <a class="reference internal" href="fileio.html#pygeode.open_multi" title="pygeode.open_multi"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open_multi()</span></code></a>, which are intended for use with such datasets. The
first, which is the simpler of the two, is better suited for moderately large
datasets which consist of multiple files, but not a large number of files. This
works by opening each file explicitly, then merging the contents of each file
into a single PyGeode dataset. Since each file is explicitly opened and its
metadata read in, this can add up for datasets consisting of a large number of
files. A subtler issue is that the concatenation PyGeode uses in this case is
also, at present, somewhat less efficient than it could be. As a result,
accessing the data opened with this method is not as fast as the second
method.</p>
<p>The second, <a class="reference internal" href="fileio.html#pygeode.open_multi" title="pygeode.open_multi"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open_multi()</span></code></a>, is somewhat more complicated to use, but is
better suited for very large datasets, particularly in which the dataset is
separated along the time axis. In this case PyGeode simply opens the first and
last file in the dataset, reads in their metadata, then infers the contents of
the rest of the files given an addition (user-defined) function that maps
filenames to dates. Since only two files are opened, this is a very efficient
operation even for extremely large datasets. As was just mentioned, the data is
also loaded more efficiently than from datasets opened with the first method.</p>
<p>To demonstrate these two methods, we’ll need a sample dataset to work with. To
keep the dataset small, we’ll write out the zonal mean of the temperature field
from the second tutorial dataset, one year per file</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [17]: </span><span class="kn">from</span> <span class="nn">pygeode.tutorial</span> <span class="kn">import</span> <span class="n">t2</span>
<span class="gp">In [18]: </span><span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2011</span><span class="p">,</span> <span class="mi">2021</span><span class="p">):</span>
<span class="gp"> ....: </span> <span class="n">pyg</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_y</span><span class="si">%d</span><span class="s1">.nc'</span> <span class="o">%</span> <span class="n">y</span><span class="p">,</span> <span class="n">t2</span><span class="o">.</span><span class="n">Temp</span><span class="p">(</span><span class="n">year</span><span class="o">=</span><span class="n">y</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="s1">'lon'</span><span class="p">))</span>
<span class="gp"> ....: </span>
</pre></div>
</div>
<p>This produces ten files, each with a year of data in them.</p>
<p>To demonstrate <a class="reference internal" href="fileio.html#pygeode.openall" title="pygeode.openall"><code class="xref py py-meth docutils literal notranslate"><span class="pre">openall()</span></code></a>, we can simply provide a wildcard filename which matches
the filenames:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [19]: </span><span class="n">ds</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">openall</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_*.nc'</span><span class="p">,</span> <span class="n">namemap</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">Temp</span><span class="o">=</span><span class="s1">'T'</span><span class="p">))</span>
<span class="gp">In [20]: </span><span class="nb">print</span><span class="p">(</span><span class="n">ds</span><span class="o">.</span><span class="n">T</span><span class="p">)</span>
<span class="go"><Var 'T'>:</span>
<span class="go"> Shape: (time,pres,lat) (3650,20,31)</span>
<span class="go"> Axes:</span>
<span class="go"> time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)</span>
<span class="go"> pres <Pres> : 1000 hPa to 50 hPa (20 values)</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go"> Attributes:</span>
<span class="go"> {}</span>
<span class="go"> Type: SortedVar (dtype="float64")</span>
</pre></div>
</div>
<p>PyGeode expands the wildcard, opens each of the files, then concatenates the
datasets. The result is a single dataset object with 10 years of temperature data that
you can use without worrying further about how the data is distributed on disk. The filenames
can alternatively be specified as a list; in fact in this case PyGeode expands each item
in the list if there are wildcards (using <a class="reference external" href="https://docs.python.org/3/library/glob.html#glob.glob" title="(in Python v3.10)"><code class="xref py py-func docutils literal notranslate"><span class="pre">glob.glob()</span></code></a>).</p>
<p>The same arguments (<code class="docutils literal notranslate"><span class="pre">dimtypes</span></code> and <code class="docutils literal notranslate"><span class="pre">namemap</span></code>) apply; here we’ve renamed the
variable <code class="docutils literal notranslate"><span class="pre">Temp</span></code> to <code class="docutils literal notranslate"><span class="pre">T</span></code>. In addition, if you need to do more sophisticated
manipulations of the data in each file before PyGeode concatenates the
individual datasets, you can provide a function that takes the filename (as a
string) as a single argument, and returns the modified dataset. This can be
useful, for instance, for correcting issues in individual files:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [21]: </span><span class="k">def</span> <span class="nf">opener</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="gp"> ....: </span> <span class="n">ds</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="gp"> ....: </span> <span class="k">if</span> <span class="s1">'2016'</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span> <span class="c1"># Replace the data in this file with a dummy value</span>
<span class="gp"> ....: </span> <span class="k">return</span> <span class="n">pyg</span><span class="o">.</span><span class="n">Dataset</span><span class="p">([</span><span class="n">ds</span><span class="o">.</span><span class="n">Temp</span> <span class="o">*</span> <span class="mi">0</span> <span class="o">+</span> <span class="mf">250.</span><span class="p">])</span>
<span class="gp"> ....: </span> <span class="k">else</span><span class="p">:</span>
<span class="gp"> ....: </span> <span class="k">return</span> <span class="n">ds</span>
<span class="gp"> ....: </span>
<span class="gp">In [22]: </span><span class="n">ds</span> <span class="o">=</span> <span class="n">pyg</span><span class="o">.</span><span class="n">openall</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_*.nc'</span><span class="p">,</span> <span class="n">opener</span><span class="o">=</span><span class="n">opener</span><span class="p">)</span>
<span class="gp">In [23]: </span><span class="n">pyg</span><span class="o">.</span><span class="n">showvar</span><span class="p">(</span><span class="n">ds</span><span class="o">.</span><span class="n">Temp</span><span class="p">(</span><span class="n">lat</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">pres</span><span class="o">=</span><span class="mi">500</span><span class="p">))</span>
<span class="gh">Out[23]: </span><span class="go"><pygeode.plot.wrappers.AxesWrapper at 0x7f373411aca0></span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/t1Temp_modified.png"><img alt="_images/t1Temp_modified.png" src="_images/t1Temp_modified.png" style="width: 4in;" /></a>
<p>Using <a class="reference internal" href="fileio.html#pygeode.open_multi" title="pygeode.open_multi"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open_multi()</span></code></a> is in most respects very similar. The main difference is that
in addition to specifying the filenames, this method assumes that the dataset
is divided along the time axis, and you need to provide a way for PyGeode to
map each filename to a date.</p>
<p>In simple cases this can be done by simply specifying a regular expression that matches
the components of the date in the filename. For instance:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [24]: </span><span class="n">patt</span> <span class="o">=</span> <span class="s1">'temp_zm_y(?P<year>[0-9]</span><span class="si">{4}</span><span class="s1">).nc'</span>
<span class="gp">In [25]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open_multi</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_*.nc'</span><span class="p">,</span> <span class="n">pattern</span><span class="o">=</span><span class="n">patt</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (time,pres,lat) (3650,20,31)</span>
<span class="go">Axes:</span>
<span class="go"> time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)</span>
<span class="go"> pres <Pres> : 1000 hPa to 50 hPa (20 values)</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>The regular expression matches the four digit year in the filenames in a way
that PyGeode can understand. Since this four digit format is commonly encountered,
there is an abbreviation for it you can use which saves you the trouble of remembering
how Python regular expressions work:</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [26]: </span><span class="n">patt</span> <span class="o">=</span> <span class="s1">'temp_zm_y$Y.nc'</span>
<span class="gp">In [27]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open_multi</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_*.nc'</span><span class="p">,</span> <span class="n">pattern</span><span class="o">=</span><span class="n">patt</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (time,pres,lat) (3650,20,31)</span>
<span class="go">Axes:</span>
<span class="go"> time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)</span>
<span class="go"> pres <Pres> : 1000 hPa to 50 hPa (20 values)</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>Similar abbreviations exist for 2-digit month, day, hour, and minute fields
(see <a class="reference internal" href="fileio.html#pygeode.open_multi" title="pygeode.open_multi"><code class="xref py py-meth docutils literal notranslate"><span class="pre">open_multi()</span></code></a> for details), though if the filenames you’re working
with aren’t in the appropriate format, you’ll have to provide the full regular
expression.</p>
<p>In some cases the filenames themselves don’t quite have enough information, or
else it’s difficult to write an appropriate regular expression to parse out the
right information. As an alternative, you can also specify a Python function that
takes the filename as an argument, and returns the corresponding date dictionary.
As a simple example,</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [28]: </span><span class="k">def</span> <span class="nf">f2d</span><span class="p">(</span><span class="n">fn</span><span class="p">):</span>
<span class="gp"> ....: </span> <span class="n">date</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">month</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">day</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gp"> ....: </span> <span class="n">date</span><span class="p">[</span><span class="s1">'year'</span><span class="p">]</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">fn</span><span class="p">[</span><span class="o">-</span><span class="mi">7</span><span class="p">:</span><span class="o">-</span><span class="mi">3</span><span class="p">])</span> <span class="c1"># Extract the year from the filename and convert to an integer</span>
<span class="gp"> ....: </span> <span class="k">return</span> <span class="n">date</span>
<span class="gp"> ....: </span>
<span class="gp">In [29]: </span><span class="nb">print</span><span class="p">(</span><span class="n">pyg</span><span class="o">.</span><span class="n">open_multi</span><span class="p">(</span><span class="s1">'sample_data/temp_zm_*.nc'</span><span class="p">,</span> <span class="n">file2date</span> <span class="o">=</span> <span class="n">f2d</span><span class="p">))</span>
<span class="go"><Dataset>:</span>
<span class="go">Vars:</span>
<span class="go"> Temp (time,pres,lat) (3650,20,31)</span>
<span class="go">Axes:</span>
<span class="go"> time <ModelTime365>: Jan 1, 2011 00:00:00 to Dec 31, 2020 00:00:00 (3650 values)</span>
<span class="go"> pres <Pres> : 1000 hPa to 50 hPa (20 values)</span>
<span class="go"> lat <Lat> : 90 S to 90 N (31 values)</span>
<span class="go">Global Attributes:</span>
<span class="go">{}</span>
</pre></div>
</div>
<p>Note that in the interests of speed, PyGeode only opens the first and last file
in the dataset, and then infers the contents of the remainder of the files. The
assumption is made that the contents of each file are identical, only translated
in time. If there are errors or missing data in the interior files, these will
not be detected when opening the dataset. It is good practice when opening such
a dataset for the first time to explicitly load in at least some of the data
from each file in the dataset to confirm that all the files are well-formed.
There is a helper function for this purpose,
<a class="reference internal" href="fileio.html#pygeode.formats.multifile.check_multi" title="pygeode.formats.multifile.check_multi"><code class="xref py py-func docutils literal notranslate"><span class="pre">pygeode.formats.multifile.check_multi()</span></code></a>.</p>
</div>
<div class="section" id="saving-to-files">
<h2>Saving to files<a class="headerlink" href="#saving-to-files" title="Permalink to this headline">¶</a></h2>
<p>In contrast to reading files, saving PyGeode variables and datasets to disk is
typically a much simpler operation, and we’ve seen examples of this already in
a couple places in this tutorial. The <a class="reference internal" href="fileio.html#pygeode.save" title="pygeode.save"><code class="xref py py-meth docutils literal notranslate"><span class="pre">save()</span></code></a> method takes a filename and
either a single <a class="reference internal" href="var.html#pygeode.Var" title="pygeode.Var"><code class="xref py py-class docutils literal notranslate"><span class="pre">Var</span></code></a> object, or a <a class="reference internal" href="dataset.html#pygeode.Dataset" title="pygeode.Dataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">Dataset</span></code></a> object (in fact it
uses the helper function <a class="reference internal" href="genops.html#pygeode.asdataset" title="pygeode.asdataset"><code class="xref py py-meth docutils literal notranslate"><span class="pre">asdataset()</span></code></a> which recognizes various other
kinds of collections of variables as well). As with reading files, writing to
NetCDF files is the best supported option. By default, PyGeode writes NetCDF
version 3 format files for greater compatibility, but this version has a number
of limitations, including a maximum file size. Alternatively, version 4 format
files can be written by specifying</p>
<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [30]: </span><span class="n">pyg</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s1">'sample_data/file_v4.nc'</span><span class="p">,</span> <span class="n">t1</span><span class="p">,</span> <span class="n">version</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
</pre></div>
</div>
<p>which permits larger files to be written. There are several other options,
including built-in compression (this is a feature of the NetCDF 4 library), and
packing, which can be done in any format. See <a class="reference internal" href="fileio.html#pygeode.save" title="pygeode.save"><code class="xref py py-meth docutils literal notranslate"><span class="pre">save()</span></code></a> for details.</p>
<p>If you have a large dataset you wish to write out in multiple files, you can
simply select the subset you wish before saving the variable or dataset to
disk.</p>
</div>
</div>
</div>
<div class="bottomnav" role="navigation" aria-label="bottom navigation">
<p>
«  <a href="tut_basics.html">Basic Operations</a>
  ::  
<a class="uplink" href="reference.html">Reference</a>
  ::  
<a class="uplink" href="tutorial.html">Tutorial</a>
  ::  
<a class="uplink" href="gallery/index.html">Gallery</a>
  ::  
<a href="tut_plot.html">Plotting</a>  »
</p>
</div>
<div class="footer" role="contentinfo">
© Copyright 2020, Mike Neish, Peter Hitchcock.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.3.2.
</div>
</body>
</html>