[Tripy] Return the shape immediately if it is statically known instead of producing a trace operator. #379

slyubomirsky · 2024-11-15T08:53:17Z

Addresses issue #360, though a benchmark still has yet to be added.

… to the trace

slyubomirsky · 2024-11-15T08:55:21Z

When informally benchmarking, I did not find a large difference between the current implementation and this change, however. Perhaps the issue is how the benchmark was measured. Informal benchmark:

The function to test:

def func(t1, t2):
    return tp.Tensor(list(t1.shape) != list(t2.shape), dtype=tp.bool)

I wrote it this way because the _is_compatible helper in Parameter returns a Python bool, but tp.compile can only compile functions returning a Tripy tensor.

Comparison:

c = tp.compile(func, args=[tp.InputInfo(shape=[1, 3, 3], dtype=tp.int32), tp.InputInfo(shape=[1, 3, 3], dtype=tp.int32)])
t1 = tp.Tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
t2 = tp.Tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])

def time_run():
    start = time.perf_counter_ns()
    for _ in range(1000):
        c(t1, t2)
    end = time.perf_counter_ns()
    return (end - start) / 1.0e6

After warming up, both complete in about 25 microseconds. Should we do this comparison without compiling instead? I could also vary the input values.

slyubomirsky · 2024-11-15T09:16:41Z

tripy/tripy/frontend/trace/ops/shape.py

+        self.shape_memo = [DimensionSize(dim) for dim in self.trace_tensor.shape]
+
+    return self.shape_memo or [
+        GetDimensionSize.build([self], dim=index, always_cast_to_dimension_size=True) for index in range(self.rank)


Memoizing the GetDimensionSize ops led to an MLIR error upon repeated calls, and I'm not entirely sure why.

The case that fails (doc for repeat):

inp = tp.reshape(tp.arange(4, dtype=tp.int32), (2, 2)) out0 = tp.repeat(inp, 2, dim=0) out1 = tp.repeat(inp, 2, dim=1) np_inp = np.from_dlpack(tp.copy(inp, device=tp.device("cpu"))) # doc: omit ref_out0 = np.repeat(np_inp, 2, 0) assert np.array_equal(ref_out0, np.from_dlpack(tp.copy(out0, device=tp.device("cpu")))) # <- succeeds ref_out1 = np.repeat(np_inp, 2, 1) assert np.array_equal(ref_out1, np.from_dlpack(tp.copy(out1, device=tp.device("cpu")))) # <- fails

Error:

(t347)error: number of output elements (8) doesn't match expected number of elements (4) This error occured while trying to compile the following FlatIR expression: | | t347: [rank=(3), shape=((-1, -1, -1)), dtype=(int32), loc=(gpu:0)] = DynamicReshapeOp(t305, t346) | Note: This originated from the following expression: --> /tripy/docs/../tripy/frontend/ops/unsqueeze.py:56 in unsqueeze() | 56 | return reshape(input, result_shape) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ --> /tripy/docs/../tripy/frontend/ops/repeat.py:92 in repeat() | 92 | out = unsqueeze(input, dim + 1) | ^^^^^^^^^^^^^^^^^^^^^^^^^ --- required from here

The unsqueeze gets the shape of the input and computes a new shape as an argument to reshape. I will see if I can figure out how that results in an error when lowering to MLIR.

pranavm-nvidia · 2024-11-15T20:04:36Z

tripy/tripy/frontend/trace/ops/shape.py

+    if self.shape_memo is None and all(dim >= 0 for dim in self.trace_tensor.shape):
+        self.shape_memo = [DimensionSize(dim) for dim in self.trace_tensor.shape]


I'm not sure there's actually a benefit to storing shape_memo. Can't we simplify this to:

if all(dim >= 0 for dim in self.trace_tensor.shape): return list(self.trace_tensor.shape)

?
I don't think we even need to return DimensionSize in this case.

It's the signature of the shape operator.

Sure, but we could always change the signature. Ideally, DimensionSize and int behave basically the same from the user's perspective. I'm assuming you have shape_memo here because it's expensive to construct DimensionSizes?

pranavm-nvidia · 2024-11-15T20:05:22Z

@slyubomirsky most of the impact will be on the compile time I think.

slyubomirsky · 2024-11-15T21:26:38Z

Ah, okay. I'll profile compile time then.

Return shape directly if statically known instead of adding operators…

73519e5

… to the trace

slyubomirsky added the tripy Pull request for the tripy project label Nov 15, 2024

Do not memoize GetDimensionSizeOp calls

486890f

slyubomirsky commented Nov 15, 2024

View reviewed changes

pranavm-nvidia reviewed Nov 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tripy] Return the shape immediately if it is statically known instead of producing a trace operator. #379

[Tripy] Return the shape immediately if it is statically known instead of producing a trace operator. #379

slyubomirsky commented Nov 15, 2024

slyubomirsky commented Nov 15, 2024

slyubomirsky Nov 15, 2024

pranavm-nvidia Nov 15, 2024

slyubomirsky Nov 15, 2024

pranavm-nvidia Nov 15, 2024

pranavm-nvidia commented Nov 15, 2024

slyubomirsky commented Nov 15, 2024

		if self.shape_memo is None and all(dim >= 0 for dim in self.trace_tensor.shape):
		self.shape_memo = [DimensionSize(dim) for dim in self.trace_tensor.shape]

[Tripy] Return the shape immediately if it is statically known instead of producing a trace operator. #379

Are you sure you want to change the base?

[Tripy] Return the shape immediately if it is statically known instead of producing a trace operator. #379

Conversation

slyubomirsky commented Nov 15, 2024

slyubomirsky commented Nov 15, 2024

slyubomirsky Nov 15, 2024

Choose a reason for hiding this comment

pranavm-nvidia Nov 15, 2024

Choose a reason for hiding this comment

slyubomirsky Nov 15, 2024

Choose a reason for hiding this comment

pranavm-nvidia Nov 15, 2024

Choose a reason for hiding this comment

pranavm-nvidia commented Nov 15, 2024

slyubomirsky commented Nov 15, 2024