-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of the second basin area variable #16
Comments
From
These are both in Know that
@lcunha0118 or @mikejohnson51 - can you speak to these variables at all, and perhaps why are the values different? -thanks! |
Hey @madMatchstick, See section 3 here for the detailed answer to this. Bottom line is they come from different delineation methods. Most critically:
I'll add that most geographic variables were summarized over the geospatial fabric mentioned above, the hydrologic data was computed for gages at the outlet of the gagesII dataset. So they are saying that in cases were these diverge there may not be an accurate accounting between the landscape and the hydrology in the included data. |
Thank you @mikejohnson51 for clarifying. And for the linked article; a much needed read on my end! Per conversation, let's stick with |
@madMatchstick, @mikejohnson51, @jmframe Following up on this old issue, it seems like knowing the difference between the two delineated areas is important, but it may not be necessary to keep both in the forcing file. In other words, can we remove |
It makes sense to me given how the characteristics are determined! |
This isn't a forcing file, this is a model configuration file. These attributes come from Camels. The answer to this question depends on what attributes the model was trained on. These config files are examples. Are we talking about how to turn these examples into an operational version? There is no reason to keep anything in the config file that won't be used in the model. But removing a variable that is used by the model would be bad news. As for the question of using a basin area in general, the best approach is to train the LSTM with the best estimate of basin areas, and use the best estimates of basin area during run time. It doesn't matter what the static attribute is called. Camels just happens to have two estimates of basin area, and to know what any particular trained LSTM used, you should look at the original training config, which, for this example, is here: https://github.com/NOAA-OWP/lstm/blob/master/trained_neuralhydrology_models/hourly_all_attributes_and_forcings/config_original.yml. It shows that the camels attribute that was used in training is "area_gages2". Now, the basin area used at runtime won't come from the "Gauges 2" dataset, obviously. In an ideal world, we will train an LSTM on basin area that is calculated from the exact same method as the basin area calculation from the hydrofabric. Does that make sense? |
Thanks for the info, @jmframe! To confirm:
|
My recommendation is to completely delete the examples "hourly_all_attributes_and_forcings" because as I understand it most of the attributes are not available in the hydrofabric anyway, and we have had mondo problems trying to recreate them. So we can't use that trained model anyway. Does this solve the problem? |
The nuclear option works for me. @madMatchstick, any objections? If not, I'll make the edits and submit a PR. |
Nice, it is settled. By the way, tests by @EmilyDeardorff et al, at Summer Institute, 2021 showed that including the basin area as a static input did not help the LSTM performance, particularly in small basins, (experiments done specifically for 3-15km^2 basins). That information is already included in the model during pre- and post- processing, going to and from area normalized runoff [L T^-1] to volume flux [L^3 T^-1]. So I guess the best thing to do is avoid any area estimates as LSTM inputs. |
@jmframe I reckon you could write a good, useful LSTM state of practice review paper (if it doesn't exist already). |
That might be a great thing to do under the CIROH deep learning project, specifically for the application of NextGen, if that ever gets off the ground! |
No objections on my end. Thanks you two! |
One quick comment on the area variables ... I think keeping one estimate
even if not currently useful might be worthwhile. Of those, I'd take the
area_gages2 over anything derived from the hydrofabrics we have. We've
just done some work looking at GIS-based catchment areas and I trust the
USGS estimates in most cases over all the others we looked at (NHD+, USGS
GF, MERIT, TDX-Hydro). We were looking at CAMELS and about 85% of the
estimates are fine but some of the original USGS GF-based shapes are way
off, and couldn't be fixed well using the other fabrics.
…On Wed, Feb 7, 2024 at 11:06 AM JessicaGarrett-NOAA < ***@***.***> wrote:
The nuclear option works for me. @madMatchstick
<https://github.com/madMatchstick>, any objections? If not, I'll make the
edits and submit a PR.
No objections on my end. Thanks you two!
—
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKARPMRLS4MKNSKPT7YMLYSO7DTAVCNFSM5F6FC2W2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGI3DAMJTHE2A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
That said I might misunderstand the context -- if so, please ignore. Also,
if the areas are in other attribute or parameter files rather than part of
the HF that's even better.
…On Wed, Feb 7, 2024 at 1:54 PM Andrew Wood ***@***.***> wrote:
One quick comment on the area variables ... I think keeping one estimate
even if not currently useful might be worthwhile. Of those, I'd take the
area_gages2 over anything derived from the hydrofabrics we have. We've
just done some work looking at GIS-based catchment areas and I trust the
USGS estimates in most cases over all the others we looked at (NHD+, USGS
GF, MERIT, TDX-Hydro). We were looking at CAMELS and about 85% of the
estimates are fine but some of the original USGS GF-based shapes are way
off, and couldn't be fixed well using the other fabrics.
On Wed, Feb 7, 2024 at 11:06 AM JessicaGarrett-NOAA <
***@***.***> wrote:
> The nuclear option works for me. @madMatchstick
> <https://github.com/madMatchstick>, any objections? If not, I'll make
> the edits and submit a PR.
>
> No objections on my end. Thanks you two!
>
> —
> Reply to this email directly, view it on GitHub
> <#16 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABIKARPMRLS4MKNSKPT7YMLYSO7DTAVCNFSM5F6FC2W2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGI3DAMJTHE2A>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
@andywood I am hoping that we can completely disentangle the area calculations, and their influence on 1) model training, and 2) model runtime. the calculations used to train the model should be representative of the "true area", and the calculations used to run the model should also be representative of the "true area". It is up to the LSTM trainer (us hopefully) to make sure they use a realistic basin area for training, and it is up to the NextGen Hydrofabric team to make sure that the areas used at runtime are realistic. We should definitely continue this conversation, and make sure as best we can that we are using a similar method to train the model as is done to run it in NextGen. We should coordinate that with @mikejohnson51. By removing the example LSTM with basin area as a static attribute, hopefully we'll just avoid confusion over variable names for now. When the CIROH project "officially" starts, our first task should be to train an operational ready LSTM model, as opposed to these example models. And we'll do all we can to make sure the areas are calculated correctly then. Sound good? |
@jframe -- definitely, thanks for the perspective. The hydrofabric will
naturally come with certain geographic attributes like catchment area, and
it will be up to everyone to have eyes wide open about whether that's the
exact representative area they want to use for applications (training, run
time, analysis, etc.), and as you say to be consistent. And as you
suggest, if an HF area is hardwired somewhere into a model run and not
obvious, it may need disentangling. And seeing some of the mismatches in
the very widely used CAMELS shapefiles, eg the 'Christmas Tree basin' in
Utah, kind of highlights the potential for inconsistencies to go beneath
the radar. Fingers crossed on remaining CIROH paperwork, right?
…On Wed, Feb 7, 2024 at 3:52 PM Jonathan Frame ***@***.***> wrote:
@andywood <https://github.com/andywood> I am hoping that we can
completely disentangle the area calculations, and their influence on 1)
model training, and 2) model runtime. the calculations used to train the
model should be representative of the "true area", and the calculations
used to run the model should also be representative of the "true area". It
is up to the LSTM trainer (us hopefully) to make sure they use a realistic
basin area for training, and it is up to the NextGen Hydrofabric team to
make sure that the areas used at runtime are realistic. We should
definitely continue this conversation, and make sure as best we can that we
are using a similar method to train the model as is done to run it in
NextGen. We should coordinate that with @mikejohnson51
<https://github.com/mikejohnson51>. By removing the example LSTM with
basin area as a static attribute, hopefully we'll just avoid confusion over
variable names for now. When the CIROH project "officially" starts, our
first task should be to train an operational ready LSTM model, as opposed
to these example models. And we'll do all we can to make sure the areas are
calculated correctly then. Sound good?
—
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKARMZI6GVIZQP2Y2DVW3YSQATPAVCNFSM5F6FC2W2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGMYDMNRXGEZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Short description explaining the high-level reason for the new issue.
Current behavior
There are two values for basin area
Expected behavior
There should only be one value for basin area.
The text was updated successfully, but these errors were encountered: