Feature request: Python tools for obs sequences #742

hkershaw-brown · 2024-09-26T19:52:57Z

Use case

For CROCODILE, Python based tools for observation space diagnostics. Might be useful more generally for DART, so adding this issue to track.

Is your feature request related to a problem?

Originally for CROCODILE obs space diaginostic plotting in the python ecosystem, but the ability to examine obs squences in a dataframe in a Jupyter notebook (or Python tool of your choice) is quite helpful, e.g. finding duplicates in obs sequences, looking at output from obs converters, subsetting observations (in space, time, or by X), splitting and joining obs sequences.
No need to run obs_diag to bin observations, you can read the obs_sequence into a dataframe directly.

Example finding duplicates:

Describe your preferred solution

https://github.com/NCAR/pyDARTdiags. See issues for various notes and docs for documentation.
https://pypi.org/project/pydartdiags/ (but recommend you do a local editable pip install if you are developing this or playing with it)
BUYER BEWARE, this is bleeding edge.

Describe any alternatives you have considered

Currently using pandas, which seems ok (tried naively loading 20GB obs sequences one after the other, actually worked on my mac). Probably need to think about big-big data tools for going larger (and maybe faster).
Also keeping notes on other observation tools (NCAR/pyDARTdiags#4).

kdraeder · 2024-11-21T23:05:27Z

I worked through the Quickstart guide as a fairly naive xxPyxx user.
Here are some of the hurdles I dealt with and suggestions.

I don't have python on my laptop, so I had to decide whether to install it or find it somewhere else.
Looking for how to install it on my laptop led to too many choices, about which I know almost nothing.
I opted for derecho andor casper.

Summary of the CLI efforts; partially successful.

[These instructions would have been more helpful to me.]

Convert your obs_seq file(s?) to ASCII, if needed.
   Will this always be necessary?
   Multiple files?
Run on supercomputers (or where python3 is installed).
> bash (as instructed in dartdiags/bin/activate)

Other python instructions I've seen say to load the conda module,
to manage python packages.
> module load conda
`module` is not a command in my bash environment.
But that environment has access to the conda command.  
  and python -> python3.10.
Proceeding without conda.

  > python3 -m venv dartdiags
  > source dartdiags/bin/activate
  > pip install pydartdiags
  > python -  (Run interactively, apparently a "basic" session,
>>> from pydartdiags.obs_sequence import obs_sequence as obsq
>>> from pydartdiags.plots import plots

Now it's ready to run diagnostics.
>>> obs_seq = obsq.obs_sequence('$your_path/obs_seq.final.ascii')
I wasn't aware of how to find a list of functions.  Once I learned that they
are in `plots`, I could see them with
>>> help (plots)

>>> obs_seq.df.head()
Shows what I expect.

>>> df_qc0 = obsq.select_by_dart_qc(obs_seq.df, 0)
>>> plots.plot_rank_histogram(df_qc0)
Gives a window that is blank except for '[][][] Viewing<>' at the bottom left.
It turns out that this is a vi window with no contents in the file.
I had to kill the vi, wm3, and sh processes in order to get to a place
where I could proceed.
Helen says that "The plots are using plottly which outputs to a browser.
Use a jupyter notebook, e.g. use jupyter hub on Derecho."

Adapting Quickstart to NCAR's Jupyterhub.

Make soft links, from any directories or files in /scratch which I want to use,
into ~$user.
https://jupyterhub.hpc.ucar.edu/stable/hub/home  is the place to start.
I chose Default.  It opened a new tab.
I chose 'casper login' to use that resource (for this low intensity activity).
It took me to a page showing me lots of apps in a "Notebooks" section,
the same apps in ">_ Console" section, and a few in "$_ Other"
such as "Terminal" and a bunch of different file formats; text, julia, R, ...
I chose the Python3 notebook (based on it being the first listed).

Jupyterhub complaint:
I didn't know how to load a module, so I opened a Help:Jupyterlab Reference.
I wanted to search for "load module", but I couldn't enter text in the Search bubble,
and the <ctrl>-k it suggests doesn't do anything.
Get Started has no 'Search' or 'module' entries.
User Guide: has Searching(!)
It says use <cmd>-f for Macs to use the built-in shortcut.
That makes the Firefox Edit button flash, but nothing appears which would
let me enter text.
The <ctrl>-f box is still visible in the lower left,
but that's (?) the browser page search, which is supposedly different.

[]: import pydartdiags        This enabled the `from` commands to work.
[]: from pydartdiags.obs_sequence import obs_sequence as obsq
[]: from pydartdiags.plots import plots
File needs to be a ascii obs_seq.final file.
[]: obs_seq = obsq.obs_sequence('/glade/derecho/scratch/raeder/OSSE_BNRH_3mem/run/OSSE_BNRH_3mem.dart.e.cam_obs_seq_final.2018-01-01-21600')
[]: obs_seq.df.head()     works
[]: df_qc0 = obsq.select_by_dart_qc(obs_seq.df, 0)
[]: plots.plot_rank_histogram(df_qc0)
No visible errors, but nothing shows in the space below the command.
Then I get a new command line.
Same for
[]: df_profile, figrmse, figbias = plots.plot_profile(df_qc0, plevels)

I saved the notebook into '~raeder/from_works_no_pix.ipynb'.

Another jupyterhub complaint:
the file browser on the left is stuck on the status from when I first
started the jupyterhub tab. The refresh button doesn't help.
I found the file by searching for its name.

I opened 'from_works_no_pix.ipynb' and the pictures appeared (!).
The rank histogram seemed to show no data.
It turned out that I needed to click on the variable I wanted to see.
Then I stumbled on a way to compact the count axis to see the tops near 5000.
Choosing both vars stacks one on top of the other, which was surprising.

A third jupyterhub complaint:
"Magic" commands, such as "%load", are not found using help(%load)
or any variant of that I could think of. I had to look for the commands online.

hkershaw-brown · 2025-01-13T14:51:44Z

For future developers:
obs_sequences don't have units so watch out for converters changing units: e.g. #509
Extra metadata e.g. for radiance obs is not described in the file, only in the fortran obs_def_X_mod.f90
Binary and ascii versions of the file do not have the same information, e.g. location #286

hkershaw-brown added the Python label Sep 26, 2024

hkershaw-brown added the obs_diag label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Python tools for obs sequences #742

Feature request: Python tools for obs sequences #742

hkershaw-brown commented Sep 26, 2024 •

edited

Loading

kdraeder commented Nov 21, 2024

hkershaw-brown commented Jan 13, 2025

Feature request: Python tools for obs sequences #742

Feature request: Python tools for obs sequences #742

Comments

hkershaw-brown commented Sep 26, 2024 • edited Loading

Use case

Is your feature request related to a problem?

Describe your preferred solution

Describe any alternatives you have considered

kdraeder commented Nov 21, 2024

Summary of the CLI efforts; partially successful.

Adapting Quickstart to NCAR's Jupyterhub.

hkershaw-brown commented Jan 13, 2025

hkershaw-brown commented Sep 26, 2024 •

edited

Loading