Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in (partitioned) generated loopy kernels on multiple runs #498

Open
matthiasdiener opened this issue May 14, 2024 · 2 comments · May be fixed by #529
Open

Difference in (partitioned) generated loopy kernels on multiple runs #498

matthiasdiener opened this issue May 14, 2024 · 2 comments · May be fixed by #529
Assignees

Comments

@matthiasdiener
Copy link
Collaborator

matthiasdiener commented May 14, 2024

Even with #465, generated loopy kernels often differ across different runs of the same multi-rank application.

Consider mpirun -n 4 python -m mpi4py mirgecom/examples/wave.py --lazy which produces partially different kernels on most runs.

As an example, one of the post-partition rhs kernels for rank 1 is:

---------------------------------------------------------------------------
KERNEL: rhs
---------------------------------------------------------------------------
ARGUMENTS:
_actx_in_1_2_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
from_el_indices: type: np:dtype('int32'), shape: (4, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
dof_pick_lists: type: np:dtype('int16'), shape: (3, 4), dim_tags: (N1:stride:4, N0:stride:1) in aspace: global
dof_pick_list_indices: type: np:dtype('int8'), shape: (4), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
_pt_dist_id_9: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_actx_in_1_1_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
_pt_dist_id_8: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_actx_in_1_0_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
from_el_indices_0: type: np:dtype('int32'), shape: (10, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
dof_pick_lists_0: type: np:dtype('int16'), shape: (3, 4), dim_tags: (N1:stride:4, N0:stride:1) in aspace: global
dof_pick_list_indices_0: type: np:dtype('int8'), shape: (10), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
_pt_dist_id_7: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_pt_dist_id_6: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_pt_dist_id_5: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_pt_dist_id_12: type: np:dtype('int16'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_pt_dist_id_11: type: np:dtype('int16'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
_pt_dist_id_10: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
---------------------------------------------------------------------------
DOMAINS:
{ [iel_ensm0_outer, iel_ensm0_inner, idof_ensm0_outer, idof_ensm0_inner] : iel_ensm0_inner >= 0 and -16iel_ensm0_outer <= iel_ensm0_inner <= 9 - 16iel_ensm0_outer and iel_ensm0_inner <= 15 and idof_ensm0_inner >= 0 and -4idof_ensm0_outer <= idof_ensm0_inner <= 3 - 4idof_ensm0_outer and idof_ensm0_inner <= 3 }
{ [iel_ensm1_outer, iel_ensm1_inner, idof_ensm1_outer, idof_ensm1_inner] : iel_ensm1_inner >= 0 and -16iel_ensm1_outer <= iel_ensm1_inner <= 3 - 16iel_ensm1_outer and iel_ensm1_inner <= 15 and idof_ensm1_inner >= 0 and -4idof_ensm1_outer <= idof_ensm1_inner <= 3 - 4idof_ensm1_outer and idof_ensm1_inner <= 3 }
---------------------------------------------------------------------------
INAME TAGS:
idof_ensm0_inner: l.0
idof_ensm0_outer: unr
idof_ensm1_inner: l.0
idof_ensm1_outer: unr
iel_ensm0_inner: l.1
iel_ensm0_outer: g.0
iel_ensm1_inner: l.1
iel_ensm1_outer: g.0
---------------------------------------------------------------------------
TEMPORARIES:
cse_0_subst_0: type: np:dtype('int16'), shape: () aspace: private
cse_subst_0: type: np:dtype('int16'), shape: () aspace: private
---------------------------------------------------------------------------
INSTRUCTIONS:
       for iel_ensm0_outer, idof_ensm0_outer, iel_ensm0_inner, idof_ensm0_inner
↱              cse_0_subst_0 = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_mm_contract_cse_0, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
├↱             _pt_dist_id_5[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_1_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_5_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
├│↱            _pt_dist_id_6[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_2_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_6_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
├││↱           _pt_dist_id_7[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_0_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_7_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
││││↱          _pt_dist_id_11[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_pt_dist_id_11_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
│││││  end iel_ensm0_outer, idof_ensm0_outer, iel_ensm0_inner, idof_ensm0_inner
└└└└└↱ ... gbarrier  {id=g_barrier}
     │ for iel_ensm1_inner, iel_ensm1_outer, idof_ensm1_inner, idof_ensm1_outer
↱    ├         cse_subst_0 = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_mm_contract_cse, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
├    ├         _pt_dist_id_9[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_2_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_9_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
├    ├         _pt_dist_id_8[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_1_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_8_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
│    ├         _pt_dist_id_12[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_pt_dist_id_12_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
└    └         _pt_dist_id_10[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_0_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_10_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
       end iel_ensm1_inner, iel_ensm1_outer, idof_ensm1_inner, idof_ensm1_outer
---------------------------------------------------------------------------
LINEARIZATION:
   0: CALL KERNEL rhs
   1:     for idof_ensm0_outer
   2:         cse_0_subst_0 = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_mm_contract_cse_0}
   3:         _pt_dist_id_11[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_pt_dist_id_11_store}
   4:         _pt_dist_id_5[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_1_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_5_store}
   5:         _pt_dist_id_6[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_2_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_6_store}
   6:         _pt_dist_id_7[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_0_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_7_store}
   7:     end idof_ensm0_outer
   8: RETURN FROM KERNEL rhs
   9: ... gbarrier
  10: CALL KERNEL rhs_0
  11:     for idof_ensm1_outer
  12:         cse_subst_0 = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_mm_contract_cse}
  13:         _pt_dist_id_10[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_0_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_10_store}
  14:         _pt_dist_id_12[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_pt_dist_id_12_store}
  15:         _pt_dist_id_8[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_1_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_8_store}
  16:         _pt_dist_id_9[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_2_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_9_store}
  17:     end idof_ensm1_outer
  18: RETURN FROM KERNEL rhs_0
---------------------------------------------------------------------------

On a subsequent run, the same kernel is generated with the following diff:

--- old.txt	2024-05-14 16:58:29
+++ new.txt	2024-05-14 16:58:40
@@ -3,26 +3,26 @@
 ---------------------------------------------------------------------------
 ARGUMENTS:
 _actx_in_1_2_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
-from_el_indices: type: np:dtype('int32'), shape: (4, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
+from_el_indices: type: np:dtype('int32'), shape: (10, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
 dof_pick_lists: type: np:dtype('int16'), shape: (3, 4), dim_tags: (N1:stride:4, N0:stride:1) in aspace: global
-dof_pick_list_indices: type: np:dtype('int8'), shape: (4), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
-_pt_dist_id_9: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+dof_pick_list_indices: type: np:dtype('int8'), shape: (10), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
+_pt_dist_id_9: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
 _actx_in_1_1_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
-_pt_dist_id_8: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_8: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
 _actx_in_1_0_0: type: np:dtype('float64'), shape: (128, 10), dim_tags: (N1:stride:10, N0:stride:1) in aspace: global
-from_el_indices_0: type: np:dtype('int32'), shape: (10, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
+from_el_indices_0: type: np:dtype('int32'), shape: (4, 1), dim_tags: (N1:stride:1, N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
 dof_pick_lists_0: type: np:dtype('int16'), shape: (3, 4), dim_tags: (N1:stride:4, N0:stride:1) in aspace: global
-dof_pick_list_indices_0: type: np:dtype('int8'), shape: (10), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
-_pt_dist_id_7: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
-_pt_dist_id_6: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
-_pt_dist_id_5: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
-_pt_dist_id_12: type: np:dtype('int16'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
-_pt_dist_id_11: type: np:dtype('int16'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
-_pt_dist_id_10: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+dof_pick_list_indices_0: type: np:dtype('int8'), shape: (4), dim_tags: (N0:stride:1), tags: {AssumeNonNegative()} in aspace: global
+_pt_dist_id_7: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_6: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_5: type: np:dtype('float64'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_12: type: np:dtype('int16'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_11: type: np:dtype('int16'), shape: (4, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {AssumeNonNegative(), FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=4), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
+_pt_dist_id_10: type: np:dtype('float64'), shape: (10, 4), dim_tags: (N1:stride:4, N0:stride:1), tags: {FEMEinsumTag(indices=((FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4)), (FreeEinsumIndex(discr_entity=DiscretizationElementAxisTag(), length=10), FreeEinsumIndex(discr_entity=DiscretizationDOFAxisTag(), length=4))))} out aspace: global
 ---------------------------------------------------------------------------
 DOMAINS:
-{ [iel_ensm0_outer, iel_ensm0_inner, idof_ensm0_outer, idof_ensm0_inner] : iel_ensm0_inner >= 0 and -16iel_ensm0_outer <= iel_ensm0_inner <= 9 - 16iel_ensm0_outer and iel_ensm0_inner <= 15 and idof_ensm0_inner >= 0 and -4idof_ensm0_outer <= idof_ensm0_inner <= 3 - 4idof_ensm0_outer and idof_ensm0_inner <= 3 }
-{ [iel_ensm1_outer, iel_ensm1_inner, idof_ensm1_outer, idof_ensm1_inner] : iel_ensm1_inner >= 0 and -16iel_ensm1_outer <= iel_ensm1_inner <= 3 - 16iel_ensm1_outer and iel_ensm1_inner <= 15 and idof_ensm1_inner >= 0 and -4idof_ensm1_outer <= idof_ensm1_inner <= 3 - 4idof_ensm1_outer and idof_ensm1_inner <= 3 }
+{ [iel_ensm0_outer, iel_ensm0_inner, idof_ensm0_outer, idof_ensm0_inner] : iel_ensm0_inner >= 0 and -16iel_ensm0_outer <= iel_ensm0_inner <= 3 - 16iel_ensm0_outer and iel_ensm0_inner <= 15 and idof_ensm0_inner >= 0 and -4idof_ensm0_outer <= idof_ensm0_inner <= 3 - 4idof_ensm0_outer and idof_ensm0_inner <= 3 }
+{ [iel_ensm1_outer, iel_ensm1_inner, idof_ensm1_outer, idof_ensm1_inner] : iel_ensm1_inner >= 0 and -16iel_ensm1_outer <= iel_ensm1_inner <= 9 - 16iel_ensm1_outer and iel_ensm1_inner <= 15 and idof_ensm1_inner >= 0 and -4idof_ensm1_outer <= idof_ensm1_inner <= 3 - 4idof_ensm1_outer and idof_ensm1_inner <= 3 }
 ---------------------------------------------------------------------------
 INAME TAGS:
 idof_ensm0_inner: l.0
@@ -40,19 +40,19 @@
 ---------------------------------------------------------------------------
 INSTRUCTIONS:
        for iel_ensm0_outer, idof_ensm0_outer, iel_ensm0_inner, idof_ensm0_inner
-↱              cse_0_subst_0 = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_mm_contract_cse_0, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
-├↱             _pt_dist_id_5[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_1_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_5_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
-├│↱            _pt_dist_id_6[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_2_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_6_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
-├││↱           _pt_dist_id_7[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_0_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_7_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
-││││↱          _pt_dist_id_11[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_pt_dist_id_11_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_1_'}))}
+↱              cse_0_subst_0 = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_mm_contract_cse_0, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_0_'}))}
+├↱             _pt_dist_id_5[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_1_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_5_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_0_'}))}
+├│↱            _pt_dist_id_6[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_2_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_6_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_0_'}))}
+├││↱           _pt_dist_id_7[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = 0 + _actx_in_1_0_0[from_el_indices_0[iel_ensm0_inner + iel_ensm0_outer*16, 0], cse_0_subst_0]  {id=_pt_dist_id_7_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_0_'}))}
+││││↱          _pt_dist_id_11[iel_ensm0_inner + iel_ensm0_outer*16, idof_ensm0_inner + idof_ensm0_outer*4] = dof_pick_lists_0[dof_pick_list_indices_0[iel_ensm0_inner + iel_ensm0_outer*16], idof_ensm0_inner + idof_ensm0_outer*4]  {id=_pt_dist_id_11_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0_', 'iel_0_0_'}))}
 │││││  end iel_ensm0_outer, idof_ensm0_outer, iel_ensm0_inner, idof_ensm0_inner
 └└└└└↱ ... gbarrier  {id=g_barrier}
      │ for iel_ensm1_inner, iel_ensm1_outer, idof_ensm1_inner, idof_ensm1_outer
-↱    ├         cse_subst_0 = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_mm_contract_cse, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
-├    ├         _pt_dist_id_9[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_2_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_9_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
-├    ├         _pt_dist_id_8[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_1_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_8_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
-│    ├         _pt_dist_id_12[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_pt_dist_id_12_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
-└    └         _pt_dist_id_10[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_0_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_10_store, tags=EinsumTag(orig_loop_nest=frozenset({'idof_0_0__0', 'iel_0_0_'}))}
+↱    ├         cse_subst_0 = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_mm_contract_cse, tags=EinsumTag(orig_loop_nest=frozenset({'iel_0_1_', 'idof_0_0__0'}))}
+├    ├         _pt_dist_id_9[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_2_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_9_store, tags=EinsumTag(orig_loop_nest=frozenset({'iel_0_1_', 'idof_0_0__0'}))}
+├    ├         _pt_dist_id_8[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_1_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_8_store, tags=EinsumTag(orig_loop_nest=frozenset({'iel_0_1_', 'idof_0_0__0'}))}
+│    ├         _pt_dist_id_12[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = dof_pick_lists[dof_pick_list_indices[iel_ensm1_inner + iel_ensm1_outer*16], idof_ensm1_inner + idof_ensm1_outer*4]  {id=_pt_dist_id_12_store, tags=EinsumTag(orig_loop_nest=frozenset({'iel_0_1_', 'idof_0_0__0'}))}
+└    └         _pt_dist_id_10[iel_ensm1_inner + iel_ensm1_outer*16, idof_ensm1_inner + idof_ensm1_outer*4] = 0 + _actx_in_1_0_0[from_el_indices[iel_ensm1_inner + iel_ensm1_outer*16, 0], cse_subst_0]  {id=_pt_dist_id_10_store, tags=EinsumTag(orig_loop_nest=frozenset({'iel_0_1_', 'idof_0_0__0'}))}
        end iel_ensm1_inner, iel_ensm1_outer, idof_ensm1_inner, idof_ensm1_outer
 ---------------------------------------------------------------------------
 LINEARIZATION:

Note that after linearization, both generated kernels are 100% identical, but they differ in arguments, domains and instructions.
I am not sure at the moment how to debug this further.

This happens even with PYTHONHASHSEED set (i.e., it is not an issue of strings being stored in sets).

It is possible that this issue is caused by an earlier stage in the pipeline (e.g., in meshmode), but it seems to be restricted to multi-rank runs.

@inducer
Copy link
Owner

inducer commented May 15, 2024

Using (perhaps) persistent hashes of DAGs and/or Loopy kernels, what is the first point at which the IRs diverge?

@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Jun 17, 2024

Small update:

With #459, #465, and #505, rhs kernels differ already in the (unpartioned) input dict_of_named_arrays to _DistributedLazilyPyOpenCLCompilingFunctionCaller._dag_to_compiled_func:

Run 1:

rank=0 rhs pkb(dict_of_named_arrays)='28206eae1e8c14a3df9f574fa750962ece57b52c2e78678da71d12eb30564df3'
rank=1 rhs pkb(dict_of_named_arrays)='d960b9dafce7a026018969cbaf29ad41c2214e52d6aced090cb577608958375e'
rank=2 rhs pkb(dict_of_named_arrays)='6db16c4423774e0741ccb5d0967187ccdeb2d7ffab595a174056f7d73b81d500'
rank=3 rhs pkb(dict_of_named_arrays)='b1c3d00ca776b402f94cee5e9e1a15a85bc23f7ac3e77e631b9a5298f22567cd'

Run 2:

rank=0 rhs pkb(dict_of_named_arrays)='ba25c68c48b4e34b70ae1ac76a8c79970c832ac622a4d23c2c8c6289e0e7dfd4'
rank=1 rhs pkb(dict_of_named_arrays)='d960b9dafce7a026018969cbaf29ad41c2214e52d6aced090cb577608958375e'
rank=2 rhs pkb(dict_of_named_arrays)='6db16c4423774e0741ccb5d0967187ccdeb2d7ffab595a174056f7d73b81d500'
rank=3 rhs pkb(dict_of_named_arrays)='bc2acb0b426777500a518e8800bab80b60b7f9c933c855050441af53d350ebc9'

Note that in this example, ranks 0 and 3 have different hashes between runs, but ranks 1 and 2 match.

https://gist.github.com/matthiasdiener/96066e1e61125fe0e6c8b9e9514a69a8 has more investigation regarding this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants