-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current status of API is unclear #184
Comments
My opinion: we make all functions follow the first order. That is: type first, then the argument. |
ok I do this in a new PR |
@kahaaga and what do we do about the I have to admit I find it more natural to have Which also makes me think, why did we change the On the other hand, it would be good to have an idea of how in general the broader Julia community does this. Is the "algorithm deciding argument" before or after the input data? In JuliaDynamics overall it is first input data and then algorithm deciding argument. Eg. that's how it is done in RecurrenceAnalysis.jl or TimeseriesSurrogates.jl: |
Good idea on consulting with the community!
Well, it depends on how you phrase it. Take I don't feel strongly about either option beyond the vararg-argument, but the vargarg-argument weights pretty strongly in my opinion. I've gone through several rounds of design proposal for the CausalityTools methods without coming up with anything reasonably developer-friendly, because without it, we're back to writing a bunch of specialized methods for all input argument combinations. This is a pain, is error-prone, hard to document and hard to debug. A third option is to have the types as the first argument to methods that will never accept vargargs as the last argument. But how do we know if a method won't be extended with varargs later? In some cases it is obvious that it will never happen, but in other cases it is not so clear.
I think the reasoning was just "follow the same style everywhere". But since there is no established guidelines on the topic, we could also just decide on a style guide ourselves. Anyways, we should see what the community thinks too. I'll try to find some examples from well-established packages in the meanwhile. |
Some examples from other packages with functionality for estimating some statistic from data (I'll fill this out as I investigate more) StatsBase.jlImplements different algorithms as a different function, and parameters as keyword arguments. They do Distributions.jlUses a similar type-based approach, where each distributions is its own type. They dispatch on type-first, then input. For example, Distances.jlPerhaps the best example of using the same approach as us: distance type first, then input data, and return type is not the same as the distance type. For example, |
Ok, a radically different suggestion, regardless of argument order: use functors instead, like we do in TimeseriesSurrogates. For motivation, see the last comment in https://discourse.julialang.org/t/how-are-functors-useful/24110/28. What if we instead of having struct EntropyShannon; params...; end
struct EntropyRenyi; params...; end
struct EntropyTsallis; params...; end
...
(e::EntropyShannon)(est, x...)
(e::EntropyRenyi)(est, x...)
(e::EntropyTsallis)(est, x...) This would also simplify stuff in CausalityTools, where on the v2 branch one would do struct MIShannon <: MutualInformation; params...; end
struct MITsallis <: MutualInformation; params...; end
....
mutualinfo(def::MITsallisTypeA, measure::MITsallis, est::SomeEstimator, x, y)
mutualinfo(def::MITsallisTypeB, measure::MITsallis, est::SomeEstimator, x, y)
.... With a functor approach, this would be the much less verbose (mi::MITsallis)(def::MITsallisTypeA, est, x, y, ....) # seamlessly compute multi-information using varargs
(mi::MITsallis)(def::MITsallisTypeA, est, x, y, ...) I actually like this a lot more than the current approach. Moreover, implementing generator-type structs, as desired in #70, would come very naturally. One would have to think a bit about how to implement the |
It seems that the community generally leans towards type first, input afterwards. So let's do this everywhere. I don't see any immediate benefit for functors for entropies. At the moment it appears that you have to create exactly the same amount of structs and define the same amount of new functions. And a user would have to learn exactly the same amount of new things plus one more, the concept of functors, which I don't think is at all as a basic of a concept as a function. So why use something conceptiually more complicated than functions? mutualinfo(def::MITsallisTypeA, measure::MITsallis, est::SomeEstimator, x, y) off-topic, but I think that this is too much for a user. I wouldn't define |
Ok I read now the discourse post you linked that argues in favor of functors for establishing a 1-argument-function interface. Hence, this would be only valid if we could achieve that. We can't; we will always have a probabilities estimator and a data input. The alternative of passing the probability estimator as a "parameter" to the entropy functor goes again what we want to achieve here on an educational level: that the entropy and the probability extraction are two genuinely different things. |
Good point.
The advantage would be pre-allocation for speed ups during repeated computations (i.e. surrogate testing). But this is a deeper issue that we should address later, when all related packages have a stable release. That is, there are probably advantages to user a similar approach for
Ok, fine with me! Let's do that.
Yes, let's stick to our current approach. |
I opened an issue for this in CausalityTools. See the linked issue above. |
This is possible just as well with the struct. The pre-allocation happens once you initialize okay, I'll finish #187 tomorrow then with the final decided upon API which is: types are always first argument, but no functors. |
Ah, yes. Let's keep this in mind for later.
Sounds good. |
@kahaaga I've merged
main
(which is #168 ) into my branch that defines the encoding API and also implements it for histograms. Things remain unclear to me, mainly because discussions were spread across many different issues and PRs and I no longer remember where the information is supposed to be.Is it correct that all main API functions start with the estimator types? This means that we have:
entropy(e::Entropy, p::ProbEst, x, ...)
probabilities(p::ProbEst, x, ...)
probabilities_and_outcomes(p::ProbEst, x, ...)
But as far as I can tell, these are the only two functions that follow this appraoch of having first the type. The following functions have the different approach of having first input
x
, and then the probabilities type:outcome_space
total_outcomes
missing_outcomes
outcomes
Can you please explain what is the overall plan? Should we make all functions follow the first or the second version in the order of arguments?
The text was updated successfully, but these errors were encountered: