Ensure explicit output `dtype` for `pad_across_processes` #3219

mariusarvinte · 2024-11-05T00:48:03Z

What does this PR do?

Current implementation casts torch.bool to torch.int64 because of + pad_index, where pad_index is 0 by default:

accelerate/src/accelerate/utils/operations.py

Line 672 in d0e80e5

new_tensor = tensor.new_zeros(tuple(new_size)) + pad_index

Adds a test case for checking that torch.bool is output with the same type.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr @BenjaminBossan @SunMarc

BenjaminBossan · 2024-11-05T10:21:43Z

src/accelerate/utils/operations.py

@@ -669,7 +669,7 @@ def _pad_across_processes(tensor, dim=0, pad_index=0, pad_first=False):
        old_size = tensor.shape
        new_size = list(old_size)
        new_size[dim] = max_size
-        new_tensor = tensor.new_zeros(tuple(new_size)) + pad_index
+        new_tensor = (tensor.new_zeros(tuple(new_size)) + pad_index).to(tensor.dtype)


Hmm, I'm wondering how safe this is in case that the tensor dtype cannot represent the new data. E.g. when pad_index is not 0 or 1, casting this to bool will result in a loss of information.

+1, pad_index in the context of LLMs is usually -100.

I don't think there's any loss of useful information per se, given that the original data is retained downstream

accelerate/src/accelerate/utils/operations.py

Line 679 in c0552c9

new_tensor[indices] = tensor

What this does change though is the actual pad value. Any non-zero pad_index (e.g., -100) will result in padding with True for bool.

Does it make sense to always pad with False for bool? In our usecase, we directly manipulated bool tensors across devices and left pad_index = 0 by default. Not sure if bool actually appears in LLMs.

dxoigmn and others added 2 commits November 4, 2024 16:40

Ensure output tensor has same type as input

bcd07c6

Add test for boolean type

b46e173

BenjaminBossan reviewed Nov 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure explicit output `dtype` for `pad_across_processes` #3219

Ensure explicit output `dtype` for `pad_across_processes` #3219

mariusarvinte commented Nov 5, 2024

BenjaminBossan Nov 5, 2024

SunMarc Nov 5, 2024

mariusarvinte Nov 5, 2024 •

edited

Loading

Ensure explicit output dtype for pad_across_processes #3219

Are you sure you want to change the base?

Ensure explicit output dtype for pad_across_processes #3219

Conversation

mariusarvinte commented Nov 5, 2024

What does this PR do?

Before submitting

Who can review?

BenjaminBossan Nov 5, 2024

Choose a reason for hiding this comment

SunMarc Nov 5, 2024

Choose a reason for hiding this comment

mariusarvinte Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Ensure explicit output `dtype` for `pad_across_processes` #3219

Ensure explicit output `dtype` for `pad_across_processes` #3219

mariusarvinte Nov 5, 2024 •

edited

Loading