Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] [MVP] Package maintenance predictive model #444

Open
4 tasks
mayaCostantini opened this issue Aug 4, 2022 · 2 comments
Open
4 tasks

[Spike] [MVP] Package maintenance predictive model #444

mayaCostantini opened this issue Aug 4, 2022 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance.

Comments

@mayaCostantini
Copy link
Contributor

Problem statement

While most approaches focus on guaranteeing the provenance of software components, this is only one side of sustainable software development. One other side is the focus on software components which are critical to the success of the whole software system, its development and delivery/operation.

cc @goern

As Python developer, I would like to be able to predict if some of my dependencies will go unmaintained with time.

The idea would be to develop a learning model able when a given package will go under an acceptable level of maintenance that could be defined by the user or directly in the model, in an arbitrary way.
A PoC for this model could use project maintenance data as provided by the OpenSSF Security Scorecards, given that the upstream project implements Scorecard checks per package version instead of updating Scorecards check given the project repository last commit SHA.

Proposal description

  1. Provide a PoC of a model trained on the Scorecards dataset (with Scorecard checks per package version) capable to predict from which version a package is susceptible to go under a predefined level of maintenance. A good candidate for this task could be a Multiple Linear Regression, given that MLR assumptions (linear relationship between predictive and response variables, predictive variables are not too correlated, etc) are validated. Other supervised learning models could also be considered.
  • Select features for prediction according to the model chosen
  • Aggregate and process data for training
  • Train and validate the model, and examine coherence of the results
  • Experiment with different models and document a benchmark
  1. Find relevant integrations for the model

Think about ways to provide this model as a service, and where in a Python project lifecycle it would be most relevant for developers to predict the maintenance duration of their dependencies.

Acceptance Criteria

To be defined.

@mayaCostantini mayaCostantini added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 4, 2022
@sesheta sesheta added the needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. label Aug 4, 2022
@sesheta
Copy link
Member

sesheta commented Aug 4, 2022

@mayaCostantini: This issue is currently awaiting triage.
If a refinement session determines this is a relevant issue, it will accept the issue by applying the
triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mayaCostantini
Copy link
Contributor Author

/priority important-longterm
/sig stack-guidance

@sesheta sesheta added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. and removed needs-sig labels Aug 4, 2022
@codificat codificat moved this to 🆕 New in Planning Board Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance.
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants