-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(scanner): Extract VCSPath filtering functions #9428
base: main
Are you sure you want to change the base?
refactor(scanner): Extract VCSPath filtering functions #9428
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9428 +/- ##
=========================================
Coverage 68.02% 68.02%
Complexity 1291 1291
=========================================
Files 250 250
Lines 8808 8808
Branches 916 916
=========================================
Hits 5992 5992
Misses 2431 2431
Partials 385 385
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
scanner/src/main/kotlin/Scanner.kt
Outdated
@@ -56,6 +56,7 @@ import org.ossreviewtoolkit.model.utils.FileArchiver | |||
import org.ossreviewtoolkit.model.utils.ProvenanceFileStorage | |||
import org.ossreviewtoolkit.model.utils.getKnownProvenancesWithoutVcsPath | |||
import org.ossreviewtoolkit.model.utils.vcsPath | |||
import org.ossreviewtoolkit.scanner.FileList as ScannerFileList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an unrelated side note, I wasn't even aware (or forgot about it) that we do have both
org.ossreviewtoolkit.model.FileList
org.ossreviewtoolkit.scanner.FileList
which seem to be awfully similar. @fviernau do you recall why we have both of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in order to decouple OrtResult
vs. ProvenanceFileStorage
and other usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in order to reduce the confusion about the usage a bit, what do you think about actually renaming org.ossreviewtoolkit.scanner.FileList
to ScannerFileList
to avoid the need of an alias-import as done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it is not confusion that different layers may use similar terms for a different thing.
I believe it's fine to have the same name in different modules. Prefixing all file names in the scanner
module with Scanner
seems redundant to me.
8bc6ac9
to
aac89b7
Compare
@@ -83,6 +84,58 @@ class Scanner( | |||
private val archiver: FileArchiver? = scannerConfig.archive.createFileArchiver(), | |||
fileListStorage: ProvenanceFileStorage = scannerConfig.fileListStorage.createStorage() | |||
) { | |||
companion object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up for discussion: Should we make these top-level functions instead to reduce nesting? I'm not sure what the value for is for them to be scoped under Scanner
.
/** | ||
* Return a map of VCS paths for each [KnownProvenance] contained in [provenances]. | ||
*/ | ||
fun getVcsPathsForProvenances(provenances: Set<ProvenanceResolutionResult>) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain for what purpose these are going to be re-used?
(These are the internal of mapping OrtResult
sinternal representation to
OrtResult`s external representation.
This is a bit complex bit, and I believe we should avoid making this any more complex. If it's just about re-use, maybe this is ok, but even that would make it harder to change things. If it's about extending functionality in order to re-use, then it will be imo more problematic.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See eclipse-apoapsis/ort-server#1358 (comment) (it would probably be good to link that discussion in the commit message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please sahre the exact use case in ORT server? What exactly needs to be accomplished there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the linked PR and commit message rationale does not provide enough context, I'll leave it to @wkl3nk to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ORT scanner creates a perfect result, filtering out any non-relevant scan results.
But actually ORT Server, when calling the ORT scanners, does not use this (perfect) results returned. I guess because they are already stored in the database cache of the scanner, and one did not want to store the results twice. But this creates the problem in ORT server, that every time the scan results need to be fetched from the database cache, they need to be filtered (for VCS path). And the logic to filter them should be exactly the same as in the ORT scanner. This is the reason for having this code reusable, instead to copy-and-paste it and risk that the logic drifts away over time.
64965bf
to
04a49c6
Compare
Scanners generally do not consider the VCSPath when performing an ORT scan on a repository, as they scan the entire repository and its dependencies. The resulting scan data is cached for reuse across multiple scans. When reusing cached scan results for ORT scans with a specified VCSPath, where only a subset of the repository is scanned, it is necessary to filter out extraneous scan results. Helper functions already exist to handle this filtering. This refactoring extracts these helper functions and allows ORT Server to use them. See: eclipse-apoapsis/ort-server#1358 Signed-off-by: Wolfgang Klenk <[email protected]>
04a49c6
to
88ddaad
Compare
Scanners generally do not consider the VCSPath when performing an ORT scan on a repository, as they scan the entire repository and its dependencies. The resulting scan data is cached for reuse across multiple scans.
When reusing cached scan results for ORT scans with a specified VCSPath, where only a subset of the repository is scanned, it is necessary to filter out extraneous scan results. Helper functions already exist to handle this filtering.
This refactoring extracts these helper functions to enhance reusability by ORT Server.
See discussion here: eclipse-apoapsis/ort-server#1358
No changes in functionality or logic, just pure refactoring in separate (static) functions.