Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project Proposal] HPCToolkit #18

Open
8 tasks
blue42u opened this issue Nov 10, 2024 · 0 comments
Open
8 tasks

[Project Proposal] HPCToolkit #18

blue42u opened this issue Nov 10, 2024 · 0 comments
Labels
Project Proposal Label for project proposals to HPSF

Comments

@blue42u
Copy link

blue42u commented Nov 10, 2024

1. Name of Project

HPCToolkit

2. Project Description

HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to GPU-accelerated supercomputers. By using statistical sampling of timers and hardware performance counters on CPUs, HPCToolkit collects accurate measurements of a program's CPU work, resource consumption, and inefficiency and attributes them to the full calling context in which they occur. By monitoring GPU operations, gathering instruction-level metrics within GPU kernels, and attributing the costs of GPU work to heterogeneous calling contexts, HPCToolkit provides insight into the performance of GPU-accelerated codes. HPCToolkit works with multilingual, fully optimized applications that are dynamically linked. HPCToolkit is designed for use on large parallel systems. HPCToolkit's presentation tools enable rapid analysis of a program's execution costs, inefficiency, and scaling characteristics both within and across nodes of a parallel system. HPCToolkit supports measurement and analysis of serial codes, threaded codes (e.g. pthreads, OpenMP), MPI, and hybrid (MPI+threads) parallel codes, as well as GPU-accelerated codes that offload computation to AMD, Intel, or NVIDIA GPUs.

3. Statement on Alignment with High Performance Software Foundation's Mission

HPCToolkit is and aims to be a best-in-class performance tool for leadership supercomputers. It is one of the only performance tools able to run at leadership scales with detailed instruction-level performance attribution. Its functionality rivals the performance tools provided by Nvidia, AMD, Intel, and Cray on their own hardware. These features make HPCToolkit a necessary piece of a future HPC ecosystem dominated by cloud and AI at scale.

HPCToolkit is committed to providing quality performance analysis for a wide range of languages and platforms, particularly targeting developers of large-scale HPC applications. HPSF provides HPCToolkit with a neutral home and safe stewardship for our stakeholders in government and academia, and opens HPCToolkit to future collaboration opportunities.

4. Project Website (please provide a link)

Project Website

5. Open Source License (please provide a link)

SPDX Identifier: BSD-3-Clause (considering a relicense to Apache-2.0)
LICENSE.md

Data artifacts are licensed under the CDLA Permissive 2.0 license (SPDX: CLDA-Permissive-2.0).

6. Code of Conduct (please provide a link)

We adopt the generic LF Code of Conduct.

7. Governance Practices (please provide a link)

Project Governance

8. Two Sponsors from the High Performance Software Foundation's Technical Advisory Committee

Todd Gamblin and Christian Trott

9. What is the project's solution for source control?

GitLab.com, Git repositories under the @hpctoolkit group (e.g. hpctoolkit/hpctoolkit>).

10. What is the project's solution for issue tracking?

GitLab issues

11. Please list all external dependencies and their license

C/C++:

Java:

12. Please describe your release methodology and mechanics

HPCToolkit is released roughly semiyearly (summer and winter), although this is often adjusted due to customer needs. Releases are made as Git tags with corresponding GitLab releases and subsequently published as Spack package versions. Binary artifacts are produced automatically using Continuous Deployment practices (with minimal exceptions).

13. Please describe Software Quality efforts (CI, security, auditing)

All changes to the mainline must pass a series of automated tests and linter-style checks, run via GitLab CI. These tests cover major releases of 4 common Linux distributions (Ubuntu, RHEL, Fedora, SUSE Leap), multiple CPU architectures (amd64, aarch64, ppc64le), and multiple GPU architectures (CUDA/Nvidia, HIP/AMD). Builds include multiple GCC and Clang compiler versions.

We do not have security screening in place. This is an area we would like to improve under HPSF.

14. Please list the project's leadership team

The HPCToolkit Technical Steering Committee (@hpctoolkit/tsc) is made of the following members:

15. Please list the project members with access to commit to the mainline of the project

16. Please describe the project's decision-making process

We implement consensus-based decisions among our maintainers/committers, and we will resort to a fair vote of the TSC when consensus is not reached. These discussions happen primarily in GitLab issues/MRs or internally among the team.

Merge requests (MRs) must be approved and merged by a committer with sufficient access, although the review itself may be delegated to another contributor or reviewed in an informal meeting.

17. What is the maturity level of your project?

We aim to join the HPSF as an Established stage project.

The Established stage characterizes our project well. We are looking to create a plan for continued support for our users. We have a very small developer community and wish to expand it by leveraging the experience at LF and HPSF. And we are working with the eventual goal of achieving a Core project status.

18. Please list the project's official communication channels

  • GitLab issues and MRs
  • Email mailing list (hpctoolkit-forum =at= rice.edu, archive)

19. Please list the project's social media accounts

N/A

20. Please describe any existing financial sponsorships

Development on HPCToolkit is primarily funded from DOE grants and industry collaboration contracts via Rice University. The full list of sponsors is available on our website.

21. Please describe the project's infrastructure needs or requests

  • We are interested in expanding our CI system, to enable continuous testing on exotic compute hardware such as late-model AMD and Intel GPUs. We have a representative attending the CI working group to eventually facilitate this.
  • We are interested in assistance refreshing our website to meet modern design expectations. We are additionally interested in assistance creating/maintaining our public communication channels, such as a public chatroom for users (Slack/Discord/etc.).
  • We are interested in strengthening our contributor base within the HPC ecosystem. This is an area where we are immature and wish to leverage the experience of LF and the HPSF, as well as users' meetings and hackathons in HPC.

Criteria for Sandbox Stage

  • Meet all requirements to be a Linux Foundation project
    • Evidence: Rice University signed HPCToolkit paperwork and became an LF project
  • Have 2 TAC sponsors to champion the project & provide mentorship as needed
  • Submit a proposal for membership and present it at a meeting of the TAC
    • Evidence: This proposal
  • Have a charter document with an intellectual property policy that leverages open licenses, including, in the case of contributions of code, the use of one or more licenses approved as “open” by the Open Source Initiative. The staff of the High Performance Software Foundation can assist projects in preparing a technical charter following the High Performance Software Foundation’s standard template.
  • Have a code of conduct (part of default governance for LF – there is a template)

Criteria for Established Stage

  • Document that it is being used successfully in production by at least three independent end users which, in the TAC’s judgment, are of adequate quality and scope.
    • Evidence: HPCToolkit is deployed on supercomputers at LLNL, OLCF, and NERSC as part of E4S. We also have contracts with LLNL, ANL, and TotalEnergies to develop HPCToolkit for their needs.
  • Demonstrate development processes (e.g., use of pull requests, code review, testing, CI) that lower barriers to contribution and ensure software quality necessary for increased adoption.
    • Evidence: See software quality efforts and decision-making processes above.
  • Demonstrate a substantial ongoing flow of commits and merged contributions.
@slandath slandath added the Project Proposal Label for project proposals to HPSF label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Project Proposal Label for project proposals to HPSF
Projects
None yet
Development

No branches or pull requests

2 participants