Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PCA and docs #163

Merged
merged 6 commits into from
Oct 28, 2021
Merged

Refactor PCA and docs #163

merged 6 commits into from
Oct 28, 2021

Conversation

wildart
Copy link
Collaborator

@wildart wildart commented Oct 20, 2021

Towards #109

@wildart wildart merged commit c0f74de into master Oct 28, 2021
@wildart wildart deleted the wa/pca_rewrite branch October 28, 2021 20:43
@eliascarv
Copy link

Hi. My name is Elias Carvalho.
I think the PCA struct can be improved by adding type parameters to abstract types. Going from this:

struct PCA{T<:Real} <: LinearDimensionalityReduction
    mean::AbstractVector{T}
    proj::AbstractMatrix{T}
    prinvars::AbstractVector{T}
    tprinvar::T
    tvar::T
end

To that:

struct PCA{T<:Real, V<:AbstractVector{T}, M<:AbstractMatrix{T}} <: LinearDimensionalityReduction
    mean::V
    proj::M
    prinvars::V
    tprinvar::T
    tvar::T
end

The idea is that fields with Abstract types in structs arr not good for performance, as presented here in documentation: https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-fields-with-abstract-type

@wildart
Copy link
Collaborator Author

wildart commented Dec 11, 2021

Could you provide an example where the current type will affect performance? I guess, if you need an array type of a projection field, that could require some time checking.

@eliascarv
Copy link

eliascarv commented Dec 11, 2021

Yes I can, consider the following code:

image

Now notice the performance difference:

image

The performance problem happens because it is not efficient to access a property that has an abstract type.

@wildart
Copy link
Collaborator Author

wildart commented Dec 11, 2021

This is not a general usage of the PCA object, you would not iterate over a mean or a projection matrix. Usually, it's a matrix multiplication as in here:

reconstruct(M::PCA, y::AbstractVecOrMat{T}) where {T<:Real} = decentralize(M.proj * y, M.mean)

So, if we test MM performance as follows

struct S1{T}
    a::AbstractMatrix{T}
end
struct S2{T, M<:AbstractMatrix{T}}
    a::M
end
 
test(t::S1{T}, v::V) where {T, V<:AbstractVector{T}} = t.a*v
test(t::S2{T, M}, v::V) where {T,M<:AbstractMatrix{T},V<:AbstractVector{T}} = t.a*v 

v = collect(1.0:10.)
t1 = S1(v*v')
t2 = S2(v*v') 
@benchmark test(t1,v)
@benchmark test(t2,v)

The difference for small matrix 10x10 is about ~50ns

julia> @benchmark test(t1,v)
BenchmarkTools.Trial: 10000 samples with 413 evaluations.
 Range (min … max):  238.850 ns …  2.484 μs  ┊ GC (min … max): 0.00% … 85.37%
 Time  (median):     244.431 ns              ┊ GC (median):    0.00%
 Time  (mean 1 σ):   253.411 ns 1 79.018 ns  ┊ GC (mean 1 σ):  1.12% 1  3.29%

   ▁▇██▇▄               ▁         ▁             ▂▂▂▁▁     ▁    ▂
  ▇███████▅▁▁▃▁▃▁▁▁▄█▇██████▆▅▄▆█████▆▅▄▄▆▄▄▄▅▇███████▇▆▇▇██▇▇ █
  239 ns        Histogram: log(frequency) by time       317 ns <

 Memory estimate: 160 bytes, allocs estimate: 1.

julia> @benchmark test(t2,v)
BenchmarkTools.Trial: 10000 samples with 636 evaluations.
 Range (min … max):  193.121 ns …  1.787 μs  ┊ GC (min … max): 0.00% … 83.13%
 Time  (median):     196.686 ns              ┊ GC (median):    0.00%
 Time  (mean 1 σ):   205.346 ns 1 64.292 ns  ┊ GC (mean 1 σ):  1.37% 1  4.01%

  ▁▇█▆▁           ▁▁   ▁▁                 ▁▁▁                  ▂
  █████▅▁▄▁▄▄▆▇██████▇▇██▇▇▅▄▅▆▆▅▃▄▃▄▅▃▄▅██████████▆▅▄▄▅▅▃▄▃▄▄ █
  193 ns        Histogram: log(frequency) by time       283 ns <

 Memory estimate: 160 bytes, allocs estimate: 1.

And for the larger projection matrix 100x100 - almost no difference.

julia> @benchmark test(t1,v)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.187 μs … 256.233 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     4.890 μs               ┊ GC (median):    0.00%
 Time  (mean 1 σ):   5.014 μs 1   3.968 μs  ┊ GC (mean 1 σ):  0.50% 1  0.97%

       ▃▆█▆▃         ▃▃▂▁      ▁▂▃▃▂▁                          
  ▁▁▂▃▇█████▇▄▂▁▂▂▃▆█████▇▅▅▄▅███████▇▅▄▃▃▂▂▂▂▃▂▂▃▂▂▂▂▁▂▁▁▁▂▁ ▃
  4.19 μs         Histogram: frequency by time        6.07 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark test(t2,v)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.120 μs … 255.155 μs  ┊ GC (min … max): 0.00% … 97.01%
 Time  (median):     4.832 μs               ┊ GC (median):    0.00%
 Time  (mean 1 σ):   4.966 μs 1   3.712 μs  ┊ GC (mean 1 σ):  0.50% 1  0.97%

      ▃▇█▅        ▂▃▂       ▂▃▂▂                               
  ▁▁▂▆█████▄▂▁▁▂▃▆████▆▄▄▄▅██████▆▅▃▃▂▂▂▂▃▃▃▂▂▂▂▁▁▂▂▂▁▁▁▁▁▁▁▁ ▃
  4.12 μs         Histogram: frequency by time        6.32 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

I post this not to discourage, if you can contribute to the package with the above modification, your PR is always welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants