-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TT-MLIR Uplift] toHost API required to untilize the tensor #1176
Comments
@jnie-TT Thanks for opening the issue, I'll work on it tomorrow. We still have some uplift issues to resolve from before tenstorrent/tt-mlir#1744. Once that is complete, I'll unblock this as well... |
@jnie-TT It looks like this won't go down so easily... 😄 We are hitting OOM on the device for some of the model tests. For the one i was looking at, it is fails on moving input WIth new
Old one:
Note: in the program, this weight tensor is moved back to the host and then passed into conv2d. Here are the generated mlirs: |
@pilkicTT yes we are seeing the same on tt-torch, sorry I didn't notify you earlier. I'm currently adding a patch that overrides the input layout of conv weights to host (since they need to be on host initially anyway), and hopefully that fixes it. The issue is because tilizing these weights will increase their size by 1024x in many cases. Hopefully this will fix most issues, but if other errors pop up it might also be that the model just doesn't fit on single device (we're seeing this for llama 7b on tt-torch). I'll keep you posted, thanks for looking into this! |
Datapoint - Predrag ran with Jackson's cherry pick, which was merged to tt-mlir few hours ago at tenstorrent/tt-mlir@6578419 and it solved OOM errors, CI passed. I re-kicked off uplift in PR pointing to that commit alongside the needed memcpy change, now running, and clicked auto-merge which will close this ticket if it merges. |
- Use 6578419 for default conv weights on host fix - This uplift contains change which set new defaults for inputs/outputs (tilized and on device) and needs proper handling on our side. - The outputs now need to be moved to the host before issuing `memcpy` call. Closes #1176
With this change ttmlir#1744 on tt-mlir main, the output tensor layout is now dram-interleaved, tiled by default. Therefore it's up to the user to handle untilizing the tensor when needed. Thus prior to memcpying the runtime tensor, we need to call the
toHost
API to untilize the tensorThe text was updated successfully, but these errors were encountered: