Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packages with large number of files in spack-stack #1524

Open
mathomp4 opened this issue Feb 19, 2025 · 6 comments
Open

Packages with large number of files in spack-stack #1524

mathomp4 opened this issue Feb 19, 2025 · 6 comments

Comments

@mathomp4
Copy link
Collaborator

This isn't really a Bug since nothing is broken, but in looking to port spack-stack to NAS, I checked how many inodes are needed. At NCCS, I saw that the new 1.9 install by @ashley314 uses around ~560k inodes! Ack!

On the telecon, @climbfuji (I think) mentioned that sometimes packages install unneeded things. So I decided to dig down. The inodes are nearly all in the envs/:

> files_per_directory | sort -g
31 util
40 doc
171 spack-ext
276 configs
565 cache
38155 spack
521673 envs

Inside that, it's pretty even:

> files_per_directory | sort -g
165075 ue-oneapi-2024.2.0
165194 ue-intel-2021.10.0
191401 ue-gcc-12.3.0

So if I go into the gcc install dir, say the big boys are:

3068 ncurses-6.5-k76uy7m
3232 py-pandas-2.2.3-ofnkme3
3413 py-sympy-1.13.0-m4q4e37
4565 py-pythran-0.16.1-dzzea5r
8250 python-3.11.7-jtlap6x
13327 go-bootstrap-1.20.6-s4jmylr
13518 py-torch-2.5.1-ihc6zx7
14718 go-1.23.2-icng7pf
17143 boost-1.84.0-oe4f3a2
24924 eccodes-2.33.0-fjjbv5h
26895 gh-2.58.0-mw5wyqg

So I see a lot of Python and Go here.

The gh one seems excessively big. If we look:

> files_per_directory | sort -g
1 bin
9 share
26882 pkg

and in pkg/:

> \ls pkg/mod/
cache  github.com  go.mongodb.org  go.opentelemetry.io	go.uber.org  golang.org  google.golang.org  gopkg.in  k8s.io

I might call on @andyfeller (of gh fame) to ask: Is that pkg/mod directory needed once you build gh? I mean, I often just grab a binary on my machines.

If not, we can work on cleaning that up in the spack post-install(?) and save 26000 inodes each time.

@mathomp4
Copy link
Collaborator Author

In the go install, it looks like we have an entire src tree?

> files_per_directory | sort -g
2 bin
5 lib
18 doc
26 api
27 pkg
34 misc
3631 test
10957 src

Seems a bit...odd. But maybe in go you need that to compile stuff?

@andyfeller
Copy link

@mathomp4 : 👋 your summoning ritual was successful 😆

Cleaning out my local go workspace and rebuilding gh locally, I can verify the concern you're bringing up here. The following is my quick attempts to do a per-module examination of files involved to see where they are coming from:

$ find . -type d -maxdepth 2 -print -execdir sh -c 'find "{}" -type f | wc -l' \;
.
   33748
./gopkg.in
      65
./gopkg.in/[email protected]
      24
./gopkg.in/[email protected]
      41
./cache
     877
./cache/download
     877
./go.opentelemetry.io
     500
./go.opentelemetry.io/[email protected]
     418
./go.opentelemetry.io/auto
      32
./go.opentelemetry.io/otel
      50
./google.golang.org
    1590
./google.golang.org/[email protected]
     905
./google.golang.org/[email protected]
     629
./google.golang.org/genproto
      56
./k8s.io
      86
./k8s.io/klog
      86
./go.mongodb.org
    2628
./go.mongodb.org/[email protected]
    2628
./go.uber.org
     164
./go.uber.org/[email protected]
     145
./go.uber.org/[email protected]
      19
./golang.org
   12755
./golang.org/[email protected]
    9885
./golang.org/x
    2870
./github.com
   15083
./github.com/gorilla
      64
./github.com/mitchellh
      18
./github.com/pelletier
      97
./github.com/!alec!aivazis
      85
./github.com/opentracing
      36
./github.com/davecgh
      24
./github.com/muhammadmuzzammil1998
      12
./github.com/go-chi
      72
./github.com/blang
      16
./github.com/letsencrypt
     806
./github.com/sassoftware
     282
./github.com/klauspost
     428
./github.com/shurcoo!l
      29
./github.com/docker
    1609
./github.com/titanous
       4
./github.com/pmezard
       4
./github.com/!make!now!just
       8
./github.com/vbatts
      96
./github.com/transparency-dev
      70
./github.com/fsnotify
      54
./github.com/gdamore
     155
./github.com/microcosm-cc
      23
./github.com/itchyny
     113
./github.com/hashicorp
     232
./github.com/golang
      23
./github.com/stretchr
      98
./github.com/in-toto
     261
./github.com/google
     943
./github.com/go-logr
      67
./github.com/microsoft
     437
./github.com/shibumi
       8
./github.com/containerd
      17
./github.com/aymerick
      18
./github.com/distribution
      28
./github.com/alecthomas
     953
./github.com/cli
     139
./github.com/digitorus
      35
./github.com/go-openapi
    1383
./github.com/cenkalti
      19
./github.com/briandowns
      57
./github.com/magiconair
      24
./github.com/secure-systems-lab
      40
./github.com/fatih
      10
./github.com/muesli
      77
./github.com/sirupsen
      59
./github.com/jedisct1
       9
./github.com/thlib
      22
./github.com/lucasb-eyer
      38
./github.com/spf13
     285
./github.com/kballard
       8
./github.com/charmbracelet
     295
./github.com/go-jose
      82
./github.com/sigstore
    1014
./github.com/aymanbagabas
       9
./github.com/opencontainers
      89
./github.com/sagikazarmark
      28
./github.com/asaskevich
      37
./github.com/oklog
      12
./github.com/henvic
      42
./github.com/josharian
       5
./github.com/subosito
      21
./github.com/dlclark
    1913
./github.com/zalando
      20
./github.com/rodaine
       7
./github.com/theupdateframework
    1508
./github.com/mailru
      81
./github.com/gabriel-vasile
      59
./github.com/joho
      20
./github.com/rivo
     145
./github.com/mattn
      48
./github.com/yuin
     104
./github.com/alessio
      18
./github.com/mgutz
       8
./github.com/cyberphone
     107
./github.com/pkg
      16

I'm unsure what measures can be taken given that many of these focus on capabilities not natively available within go.

Suggestions

I think it is fair to raise this up in cli/cli for discussion.

Again, I'm unsure what is realistic given the nature of Go and the capabilities we support.

@mathomp4
Copy link
Collaborator Author

@andyfeller I guess my question is: is that directory needed? I mean, I can see it might be needed at build time. But is it needed at run time? Because, well, I can go to cli/cli and just pick up a release without all those.

Likewise, with Brew, I just have some manpages and gh but I suppose it might just be grabbing that release?

Now, I do see that in Brew, they run:

  def install
    gh_version = if build.stable?
      version.to_s
    else
      Utils.safe_popen_read("git", "describe", "--tags", "--dirty").chomp
    end

    with_env(
      "GH_VERSION" => gh_version,
      "GO_LDFLAGS" => "-s -w -X main.updaterEnabled=cli/cli",
    ) do
      system "make", "bin/gh", "manpages"
    end
    bin.install "bin/gh"
    man1.install Dir["share/man/man1/gh*.1"]
    generate_completions_from_executable(bin/"gh", "completion", "-s")
  end

which I read as make bin/gh manpages

But in spack, there isn't an explicit call to make bin/gh manpages. Is the default make target maybe more comprehensive?

Weirdly the cli/cli Makefile doesn't even talk about pkg/.

I might try doing an os.remove the pkg/ directory and see what happens...

@mathomp4
Copy link
Collaborator Author

Okay. This is....weird.

On my local mac laptop, I just did a spack install gh and in there:

❯ tree $(spack location -i gh)
/Users/mathomp4/spack-mathomp4/opt/spack/darwin-sonoma-m2/apple-clang-16.0.0/gh-2.63.2-kssg4ng7fj5q73iyocpxebglmb564tyi
├── bin
│   └── gh
└── share
    ├── bash-completion
    │   └── completions
    │       └── gh
    ├── fish
    │   └── vendor_completions.d
    │       └── gh.fish
    └── zsh
        └── site-functions
            └── _gh

9 directories, 4 files

I do not see the pkg/ directory that I see on Discover.

So, fine, maybe it's different on Linux. I fired up ye old AWS and:

$ tree $(spack location -i gh)
/home/ubuntu/spack/opt/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_pat/linux-ubuntu24.04-x86_64_v3/gcc-14.2.0/gh-2.63.2-hx56nfrsfdoogpanwdjyyjv2xlhkzrtq
├── bin
│   └── gh
└── share
    ├── bash-completion
    │   └── completions
    │       └── gh
    ├── fish
    │   └── vendor_completions.d
    │       └── gh.fish
    └── zsh
        └── site-functions
            └── _gh

9 directories, 4 files

Huh.

I then went to Orion and looked there and, yep, I see pkg/ there. To be doubly sure, I saw that Orion (and Discover) have 2.58.0, so I installed that locally and still no pkg/

So, it seems like it is something with spack-stack somehow doing this?

@climbfuji @AlexanderRichert-NOAA as the spack-stack experts I go to, any idea as to why spack-stack is installing gh differently than gh from "mainline" spack? I mean, the JCSDA gh/package.py file is roughly the same as the mainline gh/package.py (modulo some updates for newer tags and test fixing).

@AlexanderRichert-NOAA
Copy link
Collaborator

No idea, sorry 🙃

@mathomp4
Copy link
Collaborator Author

No idea, sorry 🙃

I did also look at go but apparently it needs 13000+ files everywhere. My laptop, Orion.

But wow this gh one is odd. It's not like we have some "don't run gh cleanup step" or something!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants