Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden version requirements to liblikwid #56

Open
jonas-schulze opened this issue Dec 19, 2023 · 3 comments
Open

Hidden version requirements to liblikwid #56

jonas-schulze opened this issue Dec 19, 2023 · 3 comments

Comments

@jonas-schulze
Copy link

I just saw your talk from JuliaCon 2023 that also mentioned LIKWID.jl and I was eager to try it out. 🚀

Unfortunately, trying the very first tutorial, I got a segfault. After some digging, I found the culprit: the definitions of CpuTopology between LIKWID.jl and the underlying liblikwid differed. My likwid.h version 5.1 (installed via apt) defines the CpuTopology struct without the numDies field, while LIKWID.jl version 0.4.4 defines its CpuTopology struct having numDies in position 4. Consequently, the fields are shifted when transferring the data from C to Julia: Julia's numDies contains C's numCoresPerSocket, ..., numCacheLevels contains the value of threadPool, which is a pointer. Wrapping cacheLevels (C's topologyTree) into an array of length numCacheLevels (C's threadPool) leads to the malformed memory accesses. 💥

likwid.h: https://github.com/RRZE-HPC/likwid/blob/v5.1/src/includes/likwid.h#L370-L380

typedef struct {
    uint32_t numHWThreads; /*!< \brief Amount of active HW threads in the system (e.g. in cpuset) */
    uint32_t activeHWThreads; /*!< \brief Amount of HW threads in the system and length of \a threadPool */
    uint32_t numSockets; /*!< \brief Amount of CPU sockets/packages in the system */
    uint32_t numCoresPerSocket; /*!< \brief Amount of physical cores in one CPU socket/package */
    uint32_t numThreadsPerCore; /*!< \brief Amount of HW threads in one physical CPU core */
    uint32_t numCacheLevels; /*!< \brief Amount of caches for each HW thread and length of \a cacheLevels */
    HWThread* threadPool; /*!< \brief List of all HW thread descriptions */
    CacheLevel*  cacheLevels; /*!< \brief List of all caches in the hierarchy */
    struct treeNode* topologyTree; /*!< \brief Anchor for a tree structure describing the system topology */
} CpuTopology;

Liblikwid.jl: https://github.com/JuliaPerf/LIKWID.jl/blob/v0.4.4/src/LibLikwid.jl#L640-L651

struct CpuTopology
    numHWThreads::UInt32
    activeHWThreads::UInt32
    numSockets::UInt32
    numDies::UInt32
    numCoresPerSocket::UInt32
    numThreadsPerCore::UInt32
    numCacheLevels::UInt32
    threadPool::Ptr{HWThread}
    cacheLevels::Ptr{CacheLevel}
    topologyTree::Ptr{treeNode}
end

The numDies field has been added in RRZE-HPC/likwid@a0ac14d, which according to GitHub shipped in likwid version 5.2 and onwards. Moving the numDies field to the end of the C struct is not an option, I guess, and removing numDies from LIKWID.jl as its not used anyways. Therefore, I would recommend:

  • Document the version requirements for liblikwid
  • Let LIKWID.jl fail gracefully, if it recognizes an unsupported version of liblikwid
  • Let LIKWID.jl bundle its own liblikwid
@jonas-schulze
Copy link
Author

Follow-up question: are the .init_fileTopology field and the mentioned static void initTopologyFile(FILE* file); function dead code?

@carstenbauer
Copy link
Member

I'm sorry that you had such a bad first experience. I made the choice to only support LIKWID >= 5.2 but didn't document it properly. My bad.

  • Document the version requirements for liblikwid
  • Let LIKWID.jl fail gracefully, if it recognizes an unsupported version of liblikwid

Agreed 👍

  • Let LIKWID.jl bundle its own liblikwid

That'd be great but it's difficult because LIKWID itself isn't portable yet. See JuliaPackaging/Yggdrasil#4913 and the links over there.

@TomTheBear
Copy link
Collaborator

Follow-up question: are the .init_fileTopology field and the mentioned static void initTopologyFile(FILE* file); function dead code?

Hi, this is not dead code. LIKWID provides likwid-getTopoCfg to create a topology file than is then read in with initTopologyFile(). Extremely helpful on systems where the topology lookup takes minutes like HPE/SGI UltraViolet (>1000 CPU sockets).

For the CUPTI stuff in LIKWID, I use versioned structs based on the CUDA/CUPTI version.

It can be expected that with every x.y.0 release there will be changes in the API, so with 5.2.0 or 5.3.0 new struct members and functions were added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants