-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create build pipelines for Java dependencies #940
Comments
This is related to (or a duplicate of) stackabletech/issues#674 |
I thought about this a bit and can think of two possible solutions, an indirect and a direct relation between the build processes. As an example: I want to build Druid and make it use a Hadoop version with custom patches 1. Indirect relation / remote Maven repo:The Hadoop build process pushes JARs to a remote Maven repo, Druid build pulls them from there. Before pushing the JARs, we have to make sure that our fixed version of Hadoop has it's own version identifier. Otherwise, if I fix something in Hadoop, push it and someone else works on Druid in parallel, my change might interfere with their Druid build. So we have to make the Druid build use a specific version of Hadoop (3.4.1-stackable1.0.0), or at least some pinned checksum. 2. Direct relation / local Maven repo:Directly copy the JARs over from another image, something like this in the Druid image:
And then make Maven use them, for example by adding this to the build profile in pom.xml:
Pros:
Cons:
|
I think 2 would generally be preferable (I really don't want the build process to have to bounce through Maven or depend on the current state of it...), but it does have the downside of slowing down the build a decent amount. I think I'd rather treat the maven repo as a secondary output of the build step, instead of flatly saying "just copy the jars from in here". That would also be more applicable to the "turns out we need to rebuild jackson too" use-case. |
Regarding the custom versions of artifacts, this Trivy page suggests that we might be able to use the version qualifier to label jars while Trivy still looks up vunls for the base version. |
So Maven would pull version The alternative would be to add the original JAR as a dummy component inside the image, then vulnerability scanners and Syft would find it as well. But then vulnerable dependencies of that JAR might still be present in the container image (at least if it's a fat JAR), even if we use newer versions of them in our forked Java dependency. The code can't really be executed, but the vulnerabilities in those dependencies would still be reported by scanners. So we wouldn't reduce the vulnerability count of the image, we could just issue VEX statements that we're not affected since the vulnerable components are not used. Which is better than nothing but not really what we want. |
Discussion updateWe had a meeting about this last week and agreed on these things for now, they are still in "draft mode" however:
Current state of our buildsI also looked at how we currently build dependencies. In general, we have the choice of building the dependency directly inside a Dockerfile when building the product, or building the dependency in some separate step, publishing the built artifact somewhere and download it in the Dockerfile. Currently, we use both ways: Variant 1, Variant 2, the build instructions for the dependency live outside of The sources for both variants are either mirrored in one of our Github repositories or in Nexus. Based on our discussion and the current state of how our builds work, I tried to create a draft for a possible way forward. Possible way forward
Advantages:
This solution just covers the next steps, not everything that is in scope if this issue is solved by it. |
Problem:
Let's say we apply a patch to e.g. Hadoop 3.4.1 to fix a vulnerability. We bump a dependency to the latest version, the vulnerability is gone. But all our products that have dependencies on Hadoop Java artifacts will still pull the original Hadoop 3.4.1 components from the default public Maven repository, which does not contain our patched version.
We could instead contribute the patch upstream, which is nice, since we also get additional validation of the patch by the maintainers and other people can easily profit from the patch as well. But: To actually use the patch in all our products, we'd have to wait for the next release of Hadoop.
Idea:
Build a patched version of Hadoop and publish it to our own Maven repo. Patch downstream products like Hive, Trino etc. to use that version of Hadoop. There might be multiple steps involved, for example: A vulnerability originating in Hadoop is present in a Trino image. It's in the Trino Phoenix plugin, so we'd have to build (and patch) that plugin ourselves. For that, we have to build and patch Phoenix ourselves first.
I think we should still try to contribute patches upstream in the long-term, because that way we can give something back, we get additional validation from the maintainers and we have to maintain less custom patches.
To do:
COPY --from...
)?The text was updated successfully, but these errors were encountered: