Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to derive NoAlias for non-aliasing stores in a loop #123349

Open
bjope opened this issue Jan 17, 2025 · 0 comments
Open

Failing to derive NoAlias for non-aliasing stores in a loop #123349

bjope opened this issue Jan 17, 2025 · 0 comments

Comments

@bjope
Copy link
Collaborator

bjope commented Jan 17, 2025

Consider this IR:

define dso_local void @bar(ptr noalias nocapture noundef %p, i16 noundef %nr) {
entry:
  %cmp45 = icmp sgt i16 %nr, 0
  tail call void @llvm.assume(i1 %cmp45)
  br label %for.body

for.body:
  %a = phi i16 [ 0, %entry ], [ %add, %for.body ]
  %mul = mul nsw nuw i16 %a, %nr
  %gep = getelementptr inbounds i32, ptr %p, i16 %mul
  ; mul2 = (a + 1) * nr
  %or = or disjoint i16 %a, 1
  %mul2 = mul nsw nuw i16 %or, %nr
  %gep2 = getelementptr inbounds i32, ptr %p, i16 %mul2
  store i32 7, ptr %gep
  store i32 9, ptr %gep2
  %add = add nuw nsw i16 %a, 4
  %cmp = icmp samesign ult i16 %a, 60
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:
  ret void
}

If running opt -passes='aa-eval' -print-all-alias-modref-info we see that %gep and %gep2 id detected as "MayAlias".
(This also happens if replacing the "or disjoint" by "add nsw nuw".)

I did try to manually rewrite the IR a bit. Given that a>=0 and nr>0 I think we can rewrite

  ; mul2 = (a + 1) * nr
  %or = or disjoint i16 %a, 1
  %mul2 = mul nsw nuw i16 %or, %nr

as

  ; mul2 = (a * nr) + nr
  %mul = mul nsw nuw i16 %a, %nr
  %mul2 = add nsw nuw i16 %mul, %nr

And then it looks like we would derive "NoAlias" when the IR looks like this instead:

define dso_local void @foo(ptr noalias nocapture noundef %p, i16 noundef %nr) {
entry:
  %cmp45 = icmp sgt i16 %nr, 0
  tail call void @llvm.assume(i1 %cmp45)
  br label %for.body

for.body:
  %a = phi i16 [ 0, %entry ], [ %add, %for.body ]
  %mul = mul nsw nuw i16 %a, %nr
  %gep = getelementptr inbounds i32, ptr %p, i16 %mul
  ; mul2 = (a * nr) + nr
  %mul2 = add nsw nuw i16 %mul, %nr
  %gep2 = getelementptr inbounds i32, ptr %p, i16 %mul2
  store i32 7, ptr %gep
  store i32 9, ptr %gep2
  %add = add nuw nsw i16 %a, 4
  %cmp = icmp samesign ult i16 %a, 60
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:
  ret void
}

So there seems to be a bit of inconsistency in the analysis, making it sensitive to what the IR looks like.

Here is a C level reproducer:

void large(int *p, int nr) {
    __builtin_assume(nr > 0);
    #pragma clang loop unroll(disable)
    for (int a = 0; a < 64; a+=2) {
        p[a * nr] = 7;
        p[(a + 1) * nr] = 9;
    }
}

Compiled with -emit-llvm -O3 -g0 we get IR similar to @bar above:

define dso_local void @csource(ptr nocapture noundef writeonly %p, i32 noundef %nr) local_unnamed_addr #0 {
entry:
  %cmp = icmp sgt i32 %nr, 0
  tail call void @llvm.assume(i1 %cmp)
  %0 = zext nneg i32 %nr to i64
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %1 = mul nuw nsw i64 %indvars.iv, %0
  %arrayidx = getelementptr inbounds nuw i32, ptr %p, i64 %1
  store i32 7, ptr %arrayidx, align 4
  %2 = or disjoint i64 %indvars.iv, 1
  %3 = mul nuw nsw i64 %2, %0
  %arrayidx4 = getelementptr inbounds nuw i32, ptr %p, i64 %3
  store i32 9, ptr %arrayidx4, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
  %cmp1 = icmp samesign ult i64 %indvars.iv, 62
  br i1 %cmp1, label %for.body, label %for.cond.cleanup
}

declare void @llvm.assume(i1 noundef) #1

attributes #0 = { nofree norecurse nosync nounwind memory(argmem: write, inaccessiblemem: write) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }

Which according to aa-eval is identified as:

Function: csource: 2 pointers, 1 call sites
  MayAlias:	i32* %arrayidx, i32* %arrayidx4

Given that it is possible to derive NoAlias for @foo above then I think it should be possible for @bar as well. And if that is solved I guess we would be able to see that the stores in the @csource test case as well.

It is at least a bit unfortunate that the C code is lowered into IR that doesn't satisfy alias analysis in the sense that we can derive that the stores are safe to reorder (or execute in parallel). Btw, if unrolling the loop we do get 64 pointers that has NoAlias, so there isn't even aliasing between iterations.

PS. Context for this problem is that I've been analysing downstream regressions after #119365 . This might be one part of the puzzle, but real problem seem to be that we drop knowledge about "nsw" after LoopStrenghtReduce. I suspect that if we can't derive NoAlias before loop-reduce, then it is even harder for loop-reduce/scev-expander to reason about "nsw" when generating reassociations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant