WIP: reviewing new api #95

kahaaga · 2022-09-19T11:28:40Z

No description provided.

codecov · 2022-09-19T11:46:37Z

Codecov Report

❗ No coverage uploaded for pull request base (refactor_api@608a804). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             refactor_api      #95   +/-   ##
===============================================
  Coverage                ?   71.46%           
===============================================
  Files                   ?       30           
  Lines                   ?      729           
  Branches                ?        0           
===============================================
  Hits                    ?      521           
  Misses                  ?      208           
  Partials                ?        0

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kahaaga · 2022-09-19T12:02:53Z

@Datseris I think I'm done here. I only did a minor language changes, but API-wise, it looks good.

Documentation

I really like the documentation now, and the extended clarification you added on the difference between probability and entropy estimation, and the non-consensus in the literature on terminology.

I particularly like how the landing page turned out. It's an excellent idea to use the module documentation as the introductory paragraph.

CI

I specified that CI should be run for PRs that are not directly to the main branch too. This way, we can develop to side branches and see how the documentation looks without committing to a PR to main first.

It's a matter of preference if we should do this, but for me personally, it would greatly simplify the workflow, because I wouldn't need to install and uninstall packages such as Documenter locally every time I want to propose a change to the codebase that affects the documentation.

API

Citing from #89

I don't think it makes sense for us to define a convenience function entropy_specificname for every single probability estimator we have. We should only do it for exceptionally well established names such as permutation entropy. If the name is in the title of at least one paper, hopefully more then it warrants its name. I don't think there is any paper that explicitly uses the words "the transfer operator entropy", and hence no reason to have functions like entropy_transferoperator. Similarly, no reason to have functions such as entropy_permutationweighted, as the entropy_permutation docstring should discuss alternative versions and how to obtain them the "normal" way.
I don't think packages should provide such incredible level of convenience, it is bad for maintanance and there is a point where the hit to the learning curve is just stronger than the convenience. When I write a 2 lines of code covenience function for using in my code, that only uses Julia Base functions, I don't think "oh this should have been a function already defined in Julia", and I don't expect it to. The users should think the same. It is better to know "Oh I know how to create the convenience function that gives me the amplitude weighted permutation entropy in 2 lines of code", than to think "hm, does this package export a function that computes the ...".

I think we've struck a good balance between convenience and clutter in the current state of the code. No need for any further changes regarding this matter, if you ask me.

Okay I've refactored this with my changes. There are things missing, and some files are incomplete, and I'm not sure where the new methods go. They also need to go into the changelog. Anyways, feel free to go ahead and review/critique my changes.

I'll refactor the PRs for the new complexity measures once we merge this to master.

I'll need your help to fix the tests. They are about the in-place entropy form that I admit, I never really learned... BTW, it is not useful to test interface tests more than once. If we test entropy_renyi!(...) isa Real, we only have to do it once, not for every estimator. This is an interface test but the interface doesn't change with estimators.

The issue with renyi_entropy! is that depending on what the estimator is, the pre-allocated container can be different things, so the input container does in fact change with estimators. Currently, the pre-allocation is only used for the symbolic estimators, for which the length of the pre-allocated container varies depending on the dimension of the input data. This pre-allocation kind of ties in with the generator approach I proposed in some other PRs. Maybe it's best to not export renyi_entropy! until we've decided what to do about the pre-allocation approaches. In a different PR, maybe?

I use renyi_entropy! extensively in my own work, and I use it in CausalityTools, but I'm not sure whether it should be exported here. I would maybe propose to not export it, and just call Entropies.renyi_entropy! in upstream packages when needed. This can be done without breaking changes in CausalityTools and friends (depends on Entropies v1) if we upgrade to v2.0 for Entropies.jl.

Summary

This is good to be merged with main. Feel free to squash and merge if you're happy.

Tests can be organized better in a separate PR
Re-thinking and potentially exporting renyi_entropy! can be done in a separate PR.
The outstanding PRs can be reworked once this is merged.

Datseris · 2022-09-19T12:38:32Z

Maybe it's best to not export renyi_entropy! until we've decided what to do about the pre-allocation approaches. In a different PR, maybe?

Hm, I would disagree here especially if this is used in downstream packages. Downstream packages should be using only the exported API, after all. It makes the code clearer, and in the end, we shouldn't rely on the fact that we develop both. Maybe in the future it will not be the case!

The first thing to first check if there is actually any substantial increase in performance for having this in-place methods. For the permutation entropy, I would imagine that counting the permutations and assigning their frequencies would be so much more than allocating one new vector. But maybe I am wrong. If there aren't substantial performance benefits, maybe it isn't even worthy to do this.

In any case, if indeed we need the in place version, we should document it and export it. I guess we:

Define entropy_renyi!.
In its docstring say "the first argument, the pre-allocated container, depends on the estimator. Estimators that can actually utilize the in-place version discuss in their own documentation strings what this argument is. Estimators that do not at all use this in-place version do not discuss anything."
Add this information to all estimators that actually use this interface.
Test it.

Does this sound reasonable? Yes, I also agree to do this in a separate PR btw.

Thanks for the docs feedback. I am not done with the docs. I've structured it in such a way that I can just "include it immediatelly" in DynamicalSystems.jl. Once I am really done with this I'll try it out and tell you how it works so you can do the same in e.g., CausalityTools.jl

Datseris · 2022-09-19T12:41:43Z

julia> @btime ones(1000);
  604.762 ns (1 allocation: 7.94 KiB)

Okay, I guess if you do this 1000 times in surrogate testing you get about 600 μicroseconds, but this would need to be compared to how much time it takes to do the permutation counting 1000 times with everything pre-allocated.

kahaaga · 2022-09-19T12:45:27Z

Hm, I would disagree here especially if this is used in downstream packages. Downstream packages should be using only the exported API, after all. It makes the code clearer, and in the end, we shouldn't rely on the fact that we develop both. Maybe in the future it will not be the case!
The first thing to first check if there is actually any substantial increase in performance for having this in-place methods. For the permutation entropy, I would imagine that counting the permutations and assigning their frequencies would be so much more than allocating one new vector. But maybe I am wrong. If there aren't substantial performance benefits, maybe it isn't even worthy to do this.
In any case, if indeed we need the in place version, we should document it and export it. I guess we:
Define entropy_renyi!.>
In its docstring say "the first argument, the pre-allocated container, depends on the estimator. Estimators that can actually utilize the in-place version discuss in their own documentation strings what this argument is. Estimators that do not at all use this in-place version do not discuss anything."
Add this information to all estimators that actually use this interface.
Test it.
Does this sound reasonable? Yes, I also agree to do this in a separate PR btw.

Yes, that sounds reasonable. Let's revisit it when the refactoring has been merged with main.

Okay, I guess if you do this 1000 times in surrogate testing you get about 600 μicroseconds, but this would need to be compared to how much time it takes to do the permutation counting 1000 times with everything pre-allocated.

We can do some testing in a separate PR.

@show

* Remove @show * Refactor nearest neighbors methods * Typos * Remove duplicate function * Reorganize docs * API renaming * Switch order of docs * Refactoring * Unnecessary capitalization * Further refactoring * Missing part of equation * Use new name * Add deprecations * Add note for single-symbol encoding in docstring * Add note for single-symbol encoding in docstring * Remove redundant note * Use @deprecate for genentropy * Add (preliminary) update message * [wip] reusable docs * re-write intro page * improve entropy wording * Complete re-organization of source code folders * correct includes state * finish first draft of docs new way (many TODOs left_) * Fix wavelet entropy docs * add docstrings to convenience calls * final touches * add documentation build phase * Add power spectrum probability estimator (#94) * add code and docstring for spectral entropy * add docs * even better test reorganization * add one more depth level * WIP: reviewing new api (#95) * Equivalent to, not equivalent with * Include dispersion entropy * Clearer wording * Symbolization routines need to be imported before estimators using them * Export in-place version. * Follow Julia's linting conventions * Run CI on all pull requests * Don't export in-place version * Missing a letter Co-authored-by: Datseris <[email protected]>

kahaaga added 2 commits September 19, 2022 13:26

Equivalent to, not equivalent with

f2e3c9e

Include dispersion entropy

52335d6

kahaaga changed the title ~~Kah review~~ WIP: reviewing new api Sep 19, 2022

kahaaga added 5 commits September 19, 2022 13:32

Clearer wording

b5891b7

Symbolization routines need to be imported before estimators using them

ae410c3

Export in-place version.

1c7555f

Follow Julia's linting conventions

0f8164e

Run CI on all pull requests

3c54e1a

kahaaga added 2 commits September 19, 2022 14:06

Don't export in-place version

14ed42c

Missing a letter

bd69b66

Datseris marked this pull request as ready for review September 19, 2022 12:37

Datseris merged commit aace207 into refactor_api Sep 19, 2022

Datseris deleted the kah_review branch September 19, 2022 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: reviewing new api #95

WIP: reviewing new api #95

kahaaga commented Sep 19, 2022

codecov bot commented Sep 19, 2022 •

edited

Loading

kahaaga commented Sep 19, 2022 •

edited

Loading

Datseris commented Sep 19, 2022

Datseris commented Sep 19, 2022

kahaaga commented Sep 19, 2022

WIP: reviewing new api #95

WIP: reviewing new api #95

Conversation

kahaaga commented Sep 19, 2022

codecov bot commented Sep 19, 2022 • edited Loading

Codecov Report

kahaaga commented Sep 19, 2022 • edited Loading

Documentation

CI

API

Summary

Datseris commented Sep 19, 2022

Datseris commented Sep 19, 2022

kahaaga commented Sep 19, 2022

codecov bot commented Sep 19, 2022 •

edited

Loading

kahaaga commented Sep 19, 2022 •

edited

Loading