Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: reviewing new api #95

Merged
merged 9 commits into from
Sep 19, 2022
Merged

WIP: reviewing new api #95

merged 9 commits into from
Sep 19, 2022

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Sep 19, 2022

No description provided.

@kahaaga kahaaga changed the title Kah review WIP: reviewing new api Sep 19, 2022
@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

❗ No coverage uploaded for pull request base (refactor_api@608a804). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             refactor_api      #95   +/-   ##
===============================================
  Coverage                ?   71.46%           
===============================================
  Files                   ?       30           
  Lines                   ?      729           
  Branches                ?        0           
===============================================
  Hits                    ?      521           
  Misses                  ?      208           
  Partials                ?        0           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kahaaga
Copy link
Member Author

kahaaga commented Sep 19, 2022

@Datseris I think I'm done here. I only did a minor language changes, but API-wise, it looks good.

Documentation

I really like the documentation now, and the extended clarification you added on the difference between probability and entropy estimation, and the non-consensus in the literature on terminology.

I particularly like how the landing page turned out. It's an excellent idea to use the module documentation as the introductory paragraph.

Screenshot 2022-09-19 at 13 27 33

CI

I specified that CI should be run for PRs that are not directly to the main branch too. This way, we can develop to side branches and see how the documentation looks without committing to a PR to main first.

It's a matter of preference if we should do this, but for me personally, it would greatly simplify the workflow, because I wouldn't need to install and uninstall packages such as Documenter locally every time I want to propose a change to the codebase that affects the documentation.

API

Citing from #89

I don't think it makes sense for us to define a convenience function entropy_specificname for every single probability estimator we have. We should only do it for exceptionally well established names such as permutation entropy. If the name is in the title of at least one paper, hopefully more then it warrants its name. I don't think there is any paper that explicitly uses the words "the transfer operator entropy", and hence no reason to have functions like entropy_transferoperator. Similarly, no reason to have functions such as entropy_permutationweighted, as the entropy_permutation docstring should discuss alternative versions and how to obtain them the "normal" way.
I don't think packages should provide such incredible level of convenience, it is bad for maintanance and there is a point where the hit to the learning curve is just stronger than the convenience. When I write a 2 lines of code covenience function for using in my code, that only uses Julia Base functions, I don't think "oh this should have been a function already defined in Julia", and I don't expect it to. The users should think the same. It is better to know "Oh I know how to create the convenience function that gives me the amplitude weighted permutation entropy in 2 lines of code", than to think "hm, does this package export a function that computes the ...".

I think we've struck a good balance between convenience and clutter in the current state of the code. No need for any further changes regarding this matter, if you ask me.

Okay I've refactored this with my changes. There are things missing, and some files are incomplete, and I'm not sure where the new methods go. They also need to go into the changelog. Anyways, feel free to go ahead and review/critique my changes.

I'll refactor the PRs for the new complexity measures once we merge this to master.

I'll need your help to fix the tests. They are about the in-place entropy form that I admit, I never really learned... BTW, it is not useful to test interface tests more than once. If we test entropy_renyi!(...) isa Real, we only have to do it once, not for every estimator. This is an interface test but the interface doesn't change with estimators.

The issue with renyi_entropy! is that depending on what the estimator is, the pre-allocated container can be different things, so the input container does in fact change with estimators. Currently, the pre-allocation is only used for the symbolic estimators, for which the length of the pre-allocated container varies depending on the dimension of the input data. This pre-allocation kind of ties in with the generator approach I proposed in some other PRs. Maybe it's best to not export renyi_entropy! until we've decided what to do about the pre-allocation approaches. In a different PR, maybe?

I use renyi_entropy! extensively in my own work, and I use it in CausalityTools, but I'm not sure whether it should be exported here. I would maybe propose to not export it, and just call Entropies.renyi_entropy! in upstream packages when needed. This can be done without breaking changes in CausalityTools and friends (depends on Entropies v1) if we upgrade to v2.0 for Entropies.jl.

Summary

This is good to be merged with main. Feel free to squash and merge if you're happy.

  • Tests can be organized better in a separate PR
  • Re-thinking and potentially exporting renyi_entropy! can be done in a separate PR.
  • The outstanding PRs can be reworked once this is merged.

@Datseris Datseris marked this pull request as ready for review September 19, 2022 12:37
@Datseris
Copy link
Member

Maybe it's best to not export renyi_entropy! until we've decided what to do about the pre-allocation approaches. In a different PR, maybe?

Hm, I would disagree here especially if this is used in downstream packages. Downstream packages should be using only the exported API, after all. It makes the code clearer, and in the end, we shouldn't rely on the fact that we develop both. Maybe in the future it will not be the case!

The first thing to first check if there is actually any substantial increase in performance for having this in-place methods. For the permutation entropy, I would imagine that counting the permutations and assigning their frequencies would be so much more than allocating one new vector. But maybe I am wrong. If there aren't substantial performance benefits, maybe it isn't even worthy to do this.

In any case, if indeed we need the in place version, we should document it and export it. I guess we:

  1. Define entropy_renyi!.
  2. In its docstring say "the first argument, the pre-allocated container, depends on the estimator. Estimators that can actually utilize the in-place version discuss in their own documentation strings what this argument is. Estimators that do not at all use this in-place version do not discuss anything."
  3. Add this information to all estimators that actually use this interface.
  4. Test it.

Does this sound reasonable? Yes, I also agree to do this in a separate PR btw.


Thanks for the docs feedback. I am not done with the docs. I've structured it in such a way that I can just "include it immediatelly" in DynamicalSystems.jl. Once I am really done with this I'll try it out and tell you how it works so you can do the same in e.g., CausalityTools.jl

@Datseris Datseris merged commit aace207 into refactor_api Sep 19, 2022
@Datseris Datseris deleted the kah_review branch September 19, 2022 12:38
@Datseris
Copy link
Member

julia> @btime ones(1000);
  604.762 ns (1 allocation: 7.94 KiB)

Okay, I guess if you do this 1000 times in surrogate testing you get about 600 μicroseconds, but this would need to be compared to how much time it takes to do the permutation counting 1000 times with everything pre-allocated.

@kahaaga
Copy link
Member Author

kahaaga commented Sep 19, 2022

Hm, I would disagree here especially if this is used in downstream packages. Downstream packages should be using only the exported API, after all. It makes the code clearer, and in the end, we shouldn't rely on the fact that we develop both. Maybe in the future it will not be the case!
The first thing to first check if there is actually any substantial increase in performance for having this in-place methods. For the permutation entropy, I would imagine that counting the permutations and assigning their frequencies would be so much more than allocating one new vector. But maybe I am wrong. If there aren't substantial performance benefits, maybe it isn't even worthy to do this.
In any case, if indeed we need the in place version, we should document it and export it. I guess we:
Define entropy_renyi!.>
In its docstring say "the first argument, the pre-allocated container, depends on the estimator. Estimators that can actually utilize the in-place version discuss in their own documentation strings what this argument is. Estimators that do not at all use this in-place version do not discuss anything."
Add this information to all estimators that actually use this interface.
Test it.
Does this sound reasonable? Yes, I also agree to do this in a separate PR btw.

Yes, that sounds reasonable. Let's revisit it when the refactoring has been merged with main.

Okay, I guess if you do this 1000 times in surrogate testing you get about 600 μicroseconds, but this would need to be compared to how much time it takes to do the permutation counting 1000 times with everything pre-allocated.

We can do some testing in a separate PR.

Datseris added a commit that referenced this pull request Sep 19, 2022
* Remove @show

* Refactor nearest neighbors methods

* Typos

* Remove duplicate function

* Reorganize docs

* API renaming

* Switch order of docs

* Refactoring

* Unnecessary capitalization

* Further refactoring

* Missing part of equation

* Use new name

* Add deprecations

* Add note for single-symbol encoding in docstring

* Add note for single-symbol encoding in docstring

* Remove redundant note

* Use @deprecate for genentropy

* Add (preliminary) update message

* [wip] reusable docs

* re-write intro page

* improve entropy wording

* Complete re-organization of source code folders

* correct includes state

* finish first draft of docs new way (many TODOs left_)

* Fix wavelet entropy docs

* add docstrings to convenience calls

* final touches

* add documentation build phase

* Add power spectrum probability estimator (#94)

* add code and docstring for spectral entropy

* add docs

* even better test reorganization

* add one more depth level

* WIP: reviewing new api (#95)

* Equivalent to, not equivalent with

* Include dispersion entropy

* Clearer wording

* Symbolization routines need to be imported before estimators using them

* Export in-place version.

* Follow Julia's linting conventions

* Run CI on all pull requests

* Don't export in-place version

* Missing a letter

Co-authored-by: Datseris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants