Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(scanner): Extract VCSPath filtering functions #9428

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wkl3nk
Copy link
Contributor

@wkl3nk wkl3nk commented Nov 14, 2024

Scanners generally do not consider the VCSPath when performing an ORT scan on a repository, as they scan the entire repository and its dependencies. The resulting scan data is cached for reuse across multiple scans.

When reusing cached scan results for ORT scans with a specified VCSPath, where only a subset of the repository is scanned, it is necessary to filter out extraneous scan results. Helper functions already exist to handle this filtering.

This refactoring extracts these helper functions to enhance reusability by ORT Server.
See discussion here: eclipse-apoapsis/ort-server#1358

No changes in functionality or logic, just pure refactoring in separate (static) functions.

@wkl3nk wkl3nk requested a review from a team as a code owner November 14, 2024 16:33
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.02%. Comparing base (15dcd7b) to head (8bc6ac9).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #9428   +/-   ##
=========================================
  Coverage     68.02%   68.02%           
  Complexity     1291     1291           
=========================================
  Files           250      250           
  Lines          8808     8808           
  Branches        916      916           
=========================================
  Hits           5992     5992           
  Misses         2431     2431           
  Partials        385      385           
Flag Coverage Δ
test 35.89% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -56,6 +56,7 @@ import org.ossreviewtoolkit.model.utils.FileArchiver
import org.ossreviewtoolkit.model.utils.ProvenanceFileStorage
import org.ossreviewtoolkit.model.utils.getKnownProvenancesWithoutVcsPath
import org.ossreviewtoolkit.model.utils.vcsPath
import org.ossreviewtoolkit.scanner.FileList as ScannerFileList
Copy link
Member

@sschuberth sschuberth Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an unrelated side note, I wasn't even aware (or forgot about it) that we do have both

org.ossreviewtoolkit.model.FileList
org.ossreviewtoolkit.scanner.FileList

which seem to be awfully similar. @fviernau do you recall why we have both of them?

Copy link
Member

@fviernau fviernau Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in order to decouple OrtResult vs. ProvenanceFileStorage and other usages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in order to reduce the confusion about the usage a bit, what do you think about actually renaming org.ossreviewtoolkit.scanner.FileList to ScannerFileList to avoid the need of an alias-import as done here?

Copy link
Member

@fviernau fviernau Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it is not confusion that different layers may use similar terms for a different thing.
I believe it's fine to have the same name in different modules. Prefixing all file names in the scanner module with Scanner seems redundant to me.

@wkl3nk wkl3nk force-pushed the wkl3nk/scanner-extract-vcs-path-filtering-functions branch from 8bc6ac9 to aac89b7 Compare November 15, 2024 07:39
@sschuberth
Copy link
Member

@wkl3nk, just FYI, you will need to rebase this again once #9433 is merged to make all tests pass.

@@ -83,6 +84,58 @@ class Scanner(
private val archiver: FileArchiver? = scannerConfig.archive.createFileArchiver(),
fileListStorage: ProvenanceFileStorage = scannerConfig.fileListStorage.createStorage()
) {
companion object {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up for discussion: Should we make these top-level functions instead to reduce nesting? I'm not sure what the value for is for them to be scoped under Scanner.

/**
* Return a map of VCS paths for each [KnownProvenance] contained in [provenances].
*/
fun getVcsPathsForProvenances(provenances: Set<ProvenanceResolutionResult>) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain for what purpose these are going to be re-used?

(These are the internal of mapping OrtResultsinternal representation toOrtResult`s external representation.
This is a bit complex bit, and I believe we should avoid making this any more complex. If it's just about re-use, maybe this is ok, but even that would make it harder to change things. If it's about extending functionality in order to re-use, then it will be imo more problematic.)

Copy link
Member

@sschuberth sschuberth Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See eclipse-apoapsis/ort-server#1358 (comment) (it would probably be good to link that discussion in the commit message).

Copy link
Member

@fviernau fviernau Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please sahre the exact use case in ORT server? What exactly needs to be accomplished there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the linked PR and commit message rationale does not provide enough context, I'll leave it to @wkl3nk to explain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ORT scanner creates a perfect result, filtering out any non-relevant scan results.
But actually ORT Server, when calling the ORT scanners, does not use this (perfect) results returned. I guess because they are already stored in the database cache of the scanner, and one did not want to store the results twice. But this creates the problem in ORT server, that every time the scan results need to be fetched from the database cache, they need to be filtered (for VCS path). And the logic to filter them should be exactly the same as in the ORT scanner. This is the reason for having this code reusable, instead to copy-and-paste it and risk that the logic drifts away over time.

@wkl3nk wkl3nk force-pushed the wkl3nk/scanner-extract-vcs-path-filtering-functions branch 3 times, most recently from 64965bf to 04a49c6 Compare November 15, 2024 12:57
Scanners generally do not consider the VCSPath when performing an
ORT scan on a repository, as they scan the entire repository and
its dependencies. The resulting scan data is cached for reuse
across multiple scans.

When reusing cached scan results for ORT scans with a specified
VCSPath, where only a subset of the repository is scanned, it
is necessary to filter out extraneous scan results. Helper
functions already exist to handle this filtering.

This refactoring extracts these helper functions and allows
ORT Server to use them.
See: eclipse-apoapsis/ort-server#1358

Signed-off-by: Wolfgang Klenk <[email protected]>
@wkl3nk wkl3nk force-pushed the wkl3nk/scanner-extract-vcs-path-filtering-functions branch from 04a49c6 to 88ddaad Compare November 15, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants