Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(vcs): VCS configuration to support common and custom attributes #9271

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wkl3nk
Copy link
Contributor

@wkl3nk wkl3nk commented Oct 10, 2024

While VCS implementations are already plugins, they are not yet configurable. VCS implementations require common configurations (e.g., revision, recursive) and should support also VCS-specific configurations if they are consumed via their API.

This allows to add functionality to individual VCS implementations without the need to implement them for all of them.

This refactoring keeps the common configuration attributes as they are, while VCS-specific configurations are stored generically in an options attribute.

The change also adds VCS-specific configuration options for Git: The strategy for checking out repositories with submodules now allows to only check out the first layer of submodules.

Example configuration in config.yml:

ort:
  analyzer:
    versionControlSystems:
      Git:
        options:
          submoduleHistoryDepth: 2
          updateNestedSubmodules: false

The change is split up into 2 commits:

  • First one deals with refactoring to use a new plugin interface for version control systems, that also allows to supply specific configurations.
  • Second one is for 2 configuration options for Github VCS, dealing with submodules.

For more details, please have a look at the respective commits.

Fixes #8556.

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.95%. Comparing base (f8a0c39) to head (6c28dc0).
Report is 98 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9271      +/-   ##
============================================
- Coverage     67.67%   63.95%   -3.72%     
- Complexity     1223     1245      +22     
============================================
  Files           244      254      +10     
  Lines          8626     9298     +672     
  Branches        911     1035     +124     
============================================
+ Hits           5838     5947     +109     
- Misses         2413     2938     +525     
- Partials        375      413      +38     
Flag Coverage Δ
test 35.94% <100.00%> (-1.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch 5 times, most recently from 60b8d10 to ae31656 Compare October 11, 2024 14:06
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch 5 times, most recently from 1cf9421 to ce5d30c Compare October 21, 2024 15:27
@wkl3nk wkl3nk marked this pull request as ready for review October 21, 2024 15:30
@wkl3nk wkl3nk requested a review from a team as a code owner October 21, 2024 15:30
@fviernau
Copy link
Member

I believe the options for configuring submodule recursion could be injected via package curations. Have you considered this?

@wkl3nk
Copy link
Contributor Author

wkl3nk commented Oct 22, 2024

@fviernau Hello, can you explain a little bit more please?

@fviernau
Copy link
Member

fviernau commented Oct 22, 2024

@fviernau Hello, can you explain a little bit more please?

I believe it makes sense to have the ability to configure this separate per dependency.
I expect the following trade-off / approach to be reasonable for some users:

  1. Enable submodule recursion by default
  2. Disable submodule recursion per package curation (after reviewing that this is ok, scan won't miss anything relevant)

@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch 2 times, most recently from d9ef04f to caa86ed Compare October 23, 2024 10:09
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from caa86ed to 8217cc4 Compare October 23, 2024 10:16
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from 8217cc4 to a7c9533 Compare October 23, 2024 10:23
@fviernau
Copy link
Member

I believe it makes sense to have the ability to configure this separate per dependency. I expect the following trade-off / approach to be reasonable for some users:

1. Enable submodule recursion by default

2. Disable submodule recursion per package curation (after reviewing that this is ok, scan won't miss anything relevant)

To add to this, I believe in order to implement the above it would be necessary to pass
the config flags as parameter to the functions of the VCS implementations which make the cloning.

We could right away unify this instead of introducing two ways to tell the behavior. In particular,
even though Git, GitRepo and so forth are plugins, I believe it would be fine to assume that these are built in.
So, we can add the configuration to OrtConfiguration or any model under it, instead of to a plugin configuration.
Then, the caller could just pass whatever is configured to the VCS instance as param.
Any thoughts on that?

@wkl3nk
Copy link
Contributor Author

wkl3nk commented Oct 23, 2024

@fviernau I think I understand your concerns. However, my key issue is that there are very large repositories that act as kind of umbrella around other repositories, having already more than 50 git submodules in the first layer. These submodules then also have submodules and so on. One can't even be sure that there are no cycles in this nested submodules. ORT currently cannot even download these repositories, we tried it out and stopped it after several hours.
On the other hand, the project developers say that they usually only check out the submodules of the first layer, nothing more. This change request is about just that: Enable ORT to checkout out just the first layer of submodules and not the complete nested, possibly never ending tree of nested submodules.

@fviernau
Copy link
Member

fviernau commented Oct 23, 2024

@fviernau I think I understand your concerns. However, my key issue is that there are very large repositories that act as kind of umbrella around other repositories, having already more than 50 git submodules in the first layer. These submodules then also have submodules and so on. One can't even be sure that there are no cycles in this nested submodules. ORT currently cannot even download these repositories, we tried it out and stopped it after several hours.
On the other hand, the project developers say that they usually only check out the submodules of the first layer, nothing more. This change request is about just that: Enable ORT to checkout out just the first layer of submodules and not the complete nested, possibly never ending tree of nested submodules.

My concerns were not about the what but about the how. So, I generally agree that it makes sense that ORT provides a way to configure this. It also wasn't meant as a strict concern, but I wanted to make sure this is considered before committing to some solution. Maybe @sschuberth @mnonnenmacher any thoughts?

Another related thing: A while ago we've invented "scan by repository", which changed the scanner to no more clone recursively IIRC. The only place where recursive cloning is happening is in the provenance resolution stage IIRC. There had been an idea of mine, I believe I brought this up already longer time ago, to rewrite that provenance resolved so that it works without cloning recursively. The resolved would itself in Kotlin the recurse, but not use the VCSes recursion capabilities. If we would go that road, then configuring the recursion depth would be a configuration of that resolved, not of the VCS plugin. This is another indication (to me) that configuring this at the VCS plugin level may not be the right place.

@wkl3nk can you confirm that the issue happens during provenance resolution?

All in all, I more and more think this should be considered configuration of ORT, not of VCS plugin.
The config could be agnostic of the concrete VCS, e.g. some clone rec depth which applies to any VCS / provenance resolution?

@wkl3nk
Copy link
Contributor Author

wkl3nk commented Oct 24, 2024

@fviernau wrote

can you confirm that the issue happens during provenance resolution?

Would "provenance resolution" mean that the Analyzer has scanned for dependency management files? If yes, then the answer is: No - The inflexibility lies in the downloader, it is the default to do a "git submodule --init --recursive" to all submodules. So the Analyzer never ever kicks in, because it just is not possible in a reasonable time and space to download all these > 50 nested submodules and their respective submodules and so on. That's the reason for going in the direction to be able to configure the ORT Downloader in config.yml.

@wkl3nk
Copy link
Contributor Author

wkl3nk commented Nov 4, 2024

Hello @sschuberth , the UI tells me 1 change requested from your side, but unfortunately I cannot find it. Maybe it was overwritten somehow when I did a force push? Could you please support me in locating this change request.
Besides that, Martin is back from vacation, if there is need to discuss this.

Copy link
Member

@sschuberth sschuberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR needs some refactoring to be in line with the plugin factory concept of other plugins, and to not interfere with the new plugin API created by @mnonnenmacher.

downloader/src/main/kotlin/Downloader.kt Outdated Show resolved Hide resolved
downloader/src/main/kotlin/Downloader.kt Outdated Show resolved Hide resolved
downloader/src/main/kotlin/VersionControlSystem.kt Outdated Show resolved Hide resolved
@wkl3nk wkl3nk marked this pull request as draft November 7, 2024 15:55
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from fb56259 to 41c051f Compare November 7, 2024 16:06
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from 41c051f to 0509c0a Compare November 7, 2024 16:11
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from 0509c0a to 5221536 Compare November 8, 2024 15:48
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch 4 times, most recently from c66ccf6 to 307fa1a Compare November 11, 2024 09:14
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from 307fa1a to 04c00b5 Compare November 11, 2024 09:20
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from 04c00b5 to f8e6bb6 Compare November 13, 2024 08:57
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch 2 times, most recently from df4a6fa to baca8ab Compare November 13, 2024 12:47
While VCS implementations are already plugins, they are not yet
configurable. VCS implementations require common configurations
(e.g., `revision`, `recursive`) and should support also
VCS-specific configurations if they are consumed via their API.
This allows to add functionality to individual VCS implementations
without the need to implement them for all of them.

Fixes oss-review-toolkit#8556.

Signed-off-by: Wolfgang Klenk <[email protected]>
For large repositories with many layers of nested Git submodules, the
download process can be very time-consuming and often results in
duplicate projects in the tree of nested submodules.
This feature introduces configuration options to limit the recursive
checkout of nested Git submodules to the first layer, optimizing
performance and reducing redundancy. Additionally, it also allows to
limit the depth of commit history to fetch when downloading
the projects.

Signed-off-by: Wolfgang Klenk <[email protected]>
@wkl3nk wkl3nk force-pushed the wkl3nk/make-vcs-plugins-configurable branch from baca8ab to 6c28dc0 Compare November 13, 2024 13:50
@wkl3nk
Copy link
Contributor Author

wkl3nk commented Nov 13, 2024

Now using the new Plugin concept that is based on TypedConfigurablePluginFactory and is already was implemented for the scanners before. VCS classes now have a factory inside, and can have a specific configuration class, that is injected in the contructor of the VCS implementation by the factory. As an example, I added such a specific configuration class for Git.

@wkl3nk wkl3nk marked this pull request as ready for review November 13, 2024 14:03
@sschuberth
Copy link
Member

Now using the new Plugin concept that is based on TypedConfigurablePluginFactory

Before I start review, I'd like @mnonnenmacher to comment whether this "intermediate step" is meaningful to do (now that it's already there...) instead of straight migrating all VCS to the KSP-based plugin mechanism in the context of #9403.

@sschuberth sschuberth requested a review from a team November 14, 2024 08:16
@mnonnenmacher
Copy link
Member

mnonnenmacher commented Nov 14, 2024

Now using the new Plugin concept that is based on TypedConfigurablePluginFactory

Before I start review, I'd like @mnonnenmacher to comment whether this "intermediate step" is meaningful to do (now that it's already there...) instead of straight migrating all VCS to the KSP-based plugin mechanism in the context of #9403.

It would have made sense to use the KSP plugin API right away, however, now that it is implemented, the effort to do it in this PR or another one would be the same, so both would be fine with me.

@wkl3nk However, I have a conceptual question: In a usual ORT pipeline the downloader is used in two places:

  • Before running the analyzer to download the project repository.
  • By the scanner to download the sources of the dependencies.

The new options you introduce are to my understanding only relevant for the first step to limit the downloaded sources of the project itself. But they should not be applied within the scanner, because you cannot know beforehand if the dependencies use any submodules and if they need to be scanned or not.
Did you consider this during the implementation?

@wkl3nk
Copy link
Contributor Author

wkl3nk commented Nov 14, 2024

Where can this functionality currently be used?

@mnonnenmacher The functionality can out of the box now be used for the Downloader stage, meaning in the typical workflow where the ORT CLI is first used to download the repository. It supports adding VCS specific configurations for e.g. Git. For instance, you can configure in .ort/config/config.yml if only the first layer of submodules shall be downloaded, in case you need to deal with repositories with many many nested submodule layers. Once the repository is downloaded, it can be analyzed and scanned with other options of the ORT CLI.

In ORT Server, where we don't have this Downloader stage, and instead the Downloader is instantiated and used in the code, just before the Analyzer, this change also allows to configure the VCSs in a flexible way. How this is done would be a question within the context of the ORT Server, so this PR is more or less only the groundwork to make it possible to configure VCSs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make VCS plugins configurable
4 participants