Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: review funcionality and use of CouplingAnalysis.get_nearest_neighbors() and add a specific test #199

Open
fkuehlein opened this issue Sep 28, 2023 · 0 comments
Labels
maintenance something should be improved or is outdated

Comments

@fkuehlein
Copy link
Collaborator

fkuehlein commented Sep 28, 2023

CouplingAnalysis.get_nearest_neighbors() is currently only used as a helper method for CouplingAnalysis.mutual_information() and CouplingAnalysis.information_transfer() and tested indirectly through those. After having ported the underlying C method to Cython in #195, it appeared sensible to gain more confidence on its correct functionality by giving it a test of its own.

To create a test fixture and an expected result to compare to, it is essential to understand what the method is actually supposed to do. In trying that, I found that it has redundant loops and variables defined in several places that at least make it hard to read (see my comments here; the code appears to be adapted from a more generally applicable algorithm, but has lost its wider applicability anyway due to the adaptations).

I mostly grasped its functionality by now, but still don't really understand the special role of the z dimension given to it by the above mentioned methods it's used by. Other than that, here's what so far I found CouplingAnalysis.get_nearest_neighbors() to be currently doing:

given:
$X = (x(t), y(t), z(t))$: an array of 3 timeseries with length $T$
$d_{xyz}$: an array to indicate, where each timeseries is located within the array (depending on each timeseries' dimensions)
$k$: number of nearest neighbors to look for

NOTE: the dimension of $X_i$ is 1 for all use cases within CouplingAnalysis, except for $z(t)$, which will be either left empty in mutual_information(), or can be of dimension > 1 in information_transfer()

For all times $t = 1,...,T$:

  • find the $k$ times $t' = 1,...,T$ where in all timeseries $X_i(t')$ is closest to $X_i(t)$,
  • out of all those $k$ times $t'$, find the biggest of these distances within any timeseries as $\epsilon_{max}$
  • then, for all timeseries $X_i = x,y,z$:
    • count how many times that timeseries itself is within $\epsilon_{max}$ to $X_i$ (might be more (or also less?) often then $k$-times),
      although neighbors $t'$ within $x$ and $y$ are only counted, if $z$ has a neighbor at the same time $t'$

Still not sure if that's what it is supposed to be doing though. Probably the referenced papers Kraskov (2004) and Runge (2012b) should be consulted.

@fkuehlein fkuehlein added the maintenance something should be improved or is outdated label Sep 28, 2023
@fkuehlein fkuehlein added this to the New Release milestone Oct 12, 2023
@fkuehlein fkuehlein removed this from the Release 0.7 milestone Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance something should be improved or is outdated
Projects
None yet
Development

No branches or pull requests

1 participant