Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container dependency does not order builds correctly #208

Open
mohammedzee1000 opened this issue Mar 22, 2017 · 2 comments
Open

Container dependency does not order builds correctly #208

mohammedzee1000 opened this issue Mar 22, 2017 · 2 comments
Labels

Comments

@mohammedzee1000
Copy link
Collaborator

mohammedzee1000 commented Mar 22, 2017

Before the DC move, i had got an email from Josh Berkus asking that the kubernetes containers be rebuilt as the rpms were updated. Based on the entries in the index, i triggered the builds for kubernetes-master and kubernetes-node as all other kubernetes containers had dependency on one or other of these containers.

Once the builds were completed, however, Josh again pinged me telling me while all containers were rebuilt, some did not get updated, so i triggered rebuild on those specific ones, and they immediately got updated.

Based on my observation here, i got thinking weather we actually have build ordering problem figured, and i am not so sure. To understand what i noticed, here is what should be known about current process.

Let us first consider container dependency chain X -> Y -> Z ie X, Y Z are containers Y depends on X and Z on Y:

  1. Index job on jenkins watches the container index, and based on changes, generates jjb for entries and creates jobs on jenkins for same using the jjb.
  2. Each time the project specific job detects a change, it runs a script that takes openshift template template, processes it for specific project, and uploads it to openshift, thereby creating the project, image streams and builds, and once its done, pushes job information into the build tube. Once this is done, the job is complete.
  3. The dependencies between the containers are handled as jenkins job dependencies across jenkins projects. This means once a job is runs, if it is a dependency of another job, it triggers the same and so on.
  4. What this means is once job for X is complete, it will automatically trigger the job for Y, and completion of Y will trigger job for Z.
  5. There is a build worker that watches the build tube, picks up job if any are queued, does a build on openshift, and once done, pushed the details to test tube and then goes back to watching the tube.
    Once test does its job, it then pushes details to delivery tube and and goes back to watching the test tube.
    Its in delivery phase that container gets its actual tag and is pushed to registry.

Keeping this in mind let us now imagine a build triggering on X. If we follow this process you will notice, X, Y and Z are in build tube in a matter of minutes of each other.

Once build worker picks up X and finishes building it, it pushes the same on test tube and is now free to watch the tube again.

Now it notices Y is in tube, and initiates a build on that. Notice that there is a very high chance that delivery of X may not have been completed, which basically means Y will either fail due to unavailability of X for first time trigger, or will end up picking up the older X for the build.

Note: Even though this might not be a too much of a problem right now, this will especially become a problem when we scale the builds.

--
To add i just verified this is actually an issue in local setup

@mohammedzee1000
Copy link
Collaborator Author

mohammedzee1000 commented Mar 22, 2017

One way to possibly solve this issue is to make jenkins jobs have a feedback loop with the rest of system, ensuring a job is not finished until we know if the build has succeeded or failed completely, including any and all retries on openshift side.

One thing we should consider before applying this is how good is jenkins at handling situations where X and Y might get triggered at about the same time, due to their own reasons
@bamachrn will jenkins be able to handle this possibility, no matter if its a 0.00001% chance, that is still a chance and i'd rather cover as many corner cases as i can.

@mohammedzee1000
Copy link
Collaborator Author

Of course a more recommended solution is to use a proper dependecy graph approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants