Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm: add known problems with some events to README #114

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

gcongiu
Copy link
Contributor

@gcongiu gcongiu commented Nov 9, 2023

Pull Request Description

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@gcongiu gcongiu requested a review from dbarry9 November 9, 2023 10:12
@@ -85,6 +85,20 @@ setting the ROCP\_TOOL\_LIB to the PAPI library as follows:

The binary image of a `double` is intact; but users must recast to `double` for display purposes.

* The following ROCm events are known to cause an error when the rocm component is used in sampling mode

TA_BUSY_{sum,avr,min,max}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar memory access fault occurs with the event "TCP_TCC_NC_ATOMIC_REQ_sum." Perhaps re-word the README to not specify just the TA_BUSY_* events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording. Can you also create an issue that we can reference in the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcongiu gcongiu force-pushed the 2023.11.09_rocm-known-problems branch from 6570cdf to 15ddb92 Compare November 9, 2023 14:14
@gcongiu gcongiu changed the title rocm: add known problems with TA_BUSY_* events to README rocm: add known problems with some events to README Nov 9, 2023
@gcongiu gcongiu added this to the PAPI 7.1.0 release milestone Nov 9, 2023
@gcongiu gcongiu force-pushed the 2023.11.09_rocm-known-problems branch from 15ddb92 to 37bc995 Compare November 9, 2023 14:56
@bertwesarg
Copy link
Contributor

the rending is off with these changes:

Screenshot

See here: https://github.com/icl-utk-edu/papi/blob/37bc995884dbd1388a7eef21dafd8ad9bf255295/src/components/rocm/README.md

@gcongiu gcongiu force-pushed the 2023.11.09_rocm-known-problems branch from 37bc995 to 54bf0af Compare November 10, 2023 07:42
Comment on lines +92 to +100
$ papi_command_line TA_BUSY_avr

This utility lets you add events from the command line interface to see if they work.

Successfully added: rocm:::TA_BUSY_avr:device=0

Memory access fault by GPU node-4 (Agent handle: 0x46d6d10) on address 0x7ffed888c000. Reason: Unknown.
Aborted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume, this should be some input and output in a terminal, but thats not how it will render.

Suggested change
$ papi_command_line TA_BUSY_avr
This utility lets you add events from the command line interface to see if they work.
Successfully added: rocm:::TA_BUSY_avr:device=0
Memory access fault by GPU node-4 (Agent handle: 0x46d6d10) on address 0x7ffed888c000. Reason: Unknown.
Aborted
```console
$ papi_command_line TA_BUSY_avr
This utility lets you add events from the command line interface to see if they work.
Successfully added: rocm:::TA_BUSY_avr:device=0
Memory access fault by GPU node-4 (Agent handle: 0x46d6d10) on address 0x7ffed888c000. Reason: Unknown.
Aborted
````

@@ -85,6 +85,21 @@ setting the ROCP\_TOOL\_LIB to the PAPI library as follows:

The binary image of a `double` is intact; but users must recast to `double` for display purposes.

* Some of the ROCm events are known to cause an error when the rocm component is used in sampling mode

For example TA_BUSY_avr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example TA_BUSY_avr
For example `TA_BUSY_avr`

@gcongiu gcongiu force-pushed the 2023.11.09_rocm-known-problems branch from 54bf0af to c092407 Compare November 13, 2023 09:44
@gcongiu gcongiu force-pushed the 2023.11.09_rocm-known-problems branch from c092407 to e4cac74 Compare November 27, 2023 18:39
@gcongiu gcongiu merged commit 0d697fb into icl-utk-edu:master Nov 27, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants