-
Notifications
You must be signed in to change notification settings - Fork 44
misc
The following functions will use openblas
in julia to enable parallel computing:
-
matrix * matrix (
*
) -
matrix * vector (
*
) -
dot (
dot()
)
- The maximum number of cores
openblas
allows is #. - distributed for loop
- dot() is speed up by
BLAS. dot()
, which is a function in BLAS.
Progress:
-
change
.
to@.
, speed almost same. -
test:
X'y
BLAS.gemv('T',X,y) #<-faster
-
test:
Xa*[mu;α]: when Xa is UpperTriangular Array.
BLAS.trmv('U', 'N', 'N', Xa, [mu;α]): where Xa is normal Array.
Result: same speed (because Julia use BLAS.trmv for UpperTriangular matrix)
-
test:
Xa'ya: when Xa is UpperTriangular Array.
BLAS.trmv('U', 'T', 'N', Xa, ya): where Xa is normal Array
Result: same speed (because Julia use BLAS.trmv for UpperTriangular matrix)
-
speed up cholesky decomposition by BLAS:
LAPACK.potrf!('U', BB)
Xa = UpperTriangular(BB)
-
speed up deriving max eigenvalue by BLAS:
tmp = muX'muX
LAPACK.syev!('N', 'U', tmp)[end]
-
test BLAS on windows(IIBLMM_BLAS.jl).
BLAS.vendor()
-> openblas64set_num_threads(1) : 60s
set_num_threads(2) : 57s
set_num_threads(3) : 57s
set_num_threads(4) : 62s
set_num_threads(8) : 57s
-
test BLAS on server(farm).
BLAS.vendor()
-> -
test BLAS on server(Gausi).
Next step:
-
BLAS is multi-thread function, how to speed up BLAS by setting more threads?
In windows, setting different BLAS threads seems have no difference. This may be the problem of my laptop. I need to test in server.(try both gausi and farm server)
Ideas on paper:
-
Xa is UpperTriangular matrix. So we can use BLAS function to achieve faster speed than normal matrix. In fact, Julia will use BLAS if input matrix is UpperTriangular.
If type of Xa is normal matrix, time: 93s
If type of Xa is UpperTriangular, time: 58s
If type of Xa is normal matrix+BLAS, time:57s
-
Correct typo in paper.
-
can also use BLAS function to get max eigenvalue and do cholesky decomposition. (In paper, you only mentioned GPU to speed up.)
BLAS:
1.8 Q Is number of thread limited?
Basically, there is no limitation about number of threads. You
can specify number of threads as many as you want, but larger
number of threads will consume extra resource. I recommend you to
specify minimum number of threads.
Joint Analysis of Continuous, Censored and Categorical Traits
Integrating Phenotypic Causal Networks in GWAS
single trait and multiple trait GBLUP by providing the relationship matrix directly
User-defined Prediction Equation
Description of Mixed Effects Model
Constraint on variance components