Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor flops 0.0000e+00 Mflops with v9.0.0 #163

Open
edwardnjust opened this issue May 11, 2024 · 4 comments
Open

Factor flops 0.0000e+00 Mflops with v9.0.0 #163

edwardnjust opened this issue May 11, 2024 · 4 comments

Comments

@edwardnjust
Copy link

I run the EXAMPLE pddrive3d, and the result shows Factor flops is 0, but Solve flops has number. Is it a bu
Uploading PixPin_2024-05-11_11-37-05.png…
g?

@liuyangzhuan
Copy link
Collaborator

I cannot see your figure. Can you upload it again?

@edwardnjust
Copy link
Author

the detailed output is :


.. blocking parameters from sp_ienv():
** relaxation : 60
** max supernode : 256
** estimated fill ratio : 5
** min GEMM mkn to use GPU : 5000
.. parallel environment:
** OpenMP threads : 16
** GPU enable? : 1



.. options:
** Fact : 0
** Equil : 1
** DiagInv : 0
** ParSymbFact : 0
** ColPerm : 4
** RowPerm : 1
** ReplaceTinyPivot : 0
** IterRefine : 0
** Trans : 0
** num_lookaheads : 10
** batchCount : 0
** SymPattern : 0
** lookahead_etree : 0
** Use_TensorCore : 0
** Use 3D algorithm : 1
** parameters that can be altered by environment variables:
** superlu_relax : 60
** superlu_maxsup : 256
** min GEMM mkn to use GPU : 5000
** GPU buffer size : 256000000
** GPU streams : 8
** estimated fill ratio : 5


first gpufree time: 0.2370
first blas create time: 1.8609
MPI_Query_thread with MPI_THREAD_MULTIPLE
STDC_VERSION 199901
Library version: 9.0.0
Input matrix file: ../../../Matrix/pangulu_matrix/apache2/apache2.rb
3D process grid: 1 X 1 X 1
GHS_psdef/apache2; 2006; ; ed: N. Gould et al. |1423
FormFullA: new_nnz = 4817870, k = 4817870
Time to read and distribute matrix 0.44
Matrix size min_mn 715176
Nonzeros in L 157313555
Nonzeros in U 157313555
nonzeros in L+U 313911934
nonzeros in LSUB 36278952

** Memory Usage **********************************
** Total highmark (MB):
Sum-of-all : 3426.29 | Avg : 3426.29 | Max : 3426.29
Max at rank 0, different stages (MB):
. symbfact 264.60
. distribution 3426.29
. numfact 2668.18
** NUMfact space (MB): (sum-of-all-processes)
L\U : 2668.18 | Total : 2668.18
. max at rank 0, max L+U memory (MB): 2668.18
. max at rank 0, peak buffer (MB): 0.00


** number of Tiny Pivots: 0

.. Sol 0: ||X - Xtrue|| / ||X|| = 3.790301e-13 max_i |x - xtrue|_i / |x|_i = 3.790301e-13


**** Time (seconds) ****
EQUIL time 0.021
ROWPERM time 0.107
COLPERM time 5.630
SYMBFACT time 0.708
DISTRIBUTE time 3.099
FACTOR time 14.094
Factor flops 0.000000e+00 Mflops 0.00
SOLVE time 0.453
Solve flops 6.278243e+08 Mflops 1385.81


@liuyangzhuan
Copy link
Collaborator

Ah, good catch. The latest c++ GPU factorization code doesn't quite compute the factor flops yet. We will work on how to fix this.

@edwardnjust
Copy link
Author

edwardnjust commented Jul 15, 2024

When will fix the bug? I am trying to fix this bug recently. Could you give some advice about how to fix? Which are the relative cpp files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants