Research Artifact - Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions

This is a research artifact for "Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions". This artifact is a repository that includes lists of studied PRs from GitHub, both with and without the use of Copilot for PRs. It also provides the features of PRs that were either generated or not generated by Copilot for PRs (pertaining to RQ2), coding results for RQ3, and scripts. The purpose of this artifact is enabling researchers to replicate our results of the paper, and to reuse our dataset of Copilot for PRs for further research.

The following three research questions were constructed to guide the study.

RQ1: To what extent do developers use Copilot for PRs in the code review process?
RQ2: How are the code reviews affected by the use of Copilot for PRs?
- RQ2.1: Is there a relationship between the use of Copilot for PRs and review time?
- RQ2.2: Is there a relationship between the use of Copilot for PRs and the likelihood of a PR being merged?
RQ3: How developers adopt the content suggested by Copilot?
- RQ3.1: What kind of the supplementary information that complements the content suggested by Copilot?
- RQ3.2: What kind of the content suggested by Copilot undergoes subsequent editing by developers?

This artifact provides datasets, scripts, and other relevant material, structured as follows:

data - a directory of the dataset
- LLMPRs.csv - a list of PRs which is powered by Copilot for PRs
- control_prs_df.csv - a list of PRs which is not powered by Copilot for PRs
- LLMPRsComments.csv - raw data of PR comments from PRs powered by Copilot for PRs
- control_comments_df.csv - raw data of PR comments from PRs not powered by Copilot for PRs
- cleanedLLMPRsComments.csv - a list of cleaned PR comments (bot removal) from PRs powered by Copilot for PRs
- cleaned_control_comments_df.csv - a list of cleaned PR comments (bot removal) from PRs not powered by Copilot for PRs
- edit_contents.csv - raw data of editorial histories of PRs powered by Copilot for PRs
- edit_contents_developers.csv - editorial histories of PRs powered by Copilot for PRs after filtering
- edit_contents_developers_with_diff.csv - PRs powered by Copilot for PRs with Post-Copilot Edits
- control_metrics.csv - the metrics used for R scripts from the control group
- treatment_metrics.csv - the metrics used for R scripts from the treatment group
- groundtruthbots.csv - a list of bots from Golzadeh et al.
- coded_sample.csv - the coded editorial revisions in RQ3
scripts - a directory of the scripts
- env - a directory of environmental variables
  - tokens.txt - a list of GitHub access tokens
- CollectCopilot4prs.ipynb - Notebook file that collects raw data for RQ1-3
- ParseHistory.ipynb - Notebook file that parses editorial revisions and prepares for manual inspection in RQ3
- BuildingResults.ipynb - Notebook file that builds results in RQ1-3
- PMW_review.R - R script utilizing Propensity Score Weighting method for estimating review time in RQ2.1
- PMW_merge.R - R script utilizing Propensity Score Weighting method for estimating a PR being merged in RQ2.2
LICENSE.md - MIT License
README.md - this file
requirements.txt - required libraries for Notebook files
requirements_for_R_scripts.txt - required packages for R scripts
STATUS.txt - targeting ACM badges
INSTALL.txt - installation process of this artifact
FSE_Copilots_For_PRs.pdf - a copy of the accepted paper in PDF format
.gitattributes
.gitignore

Provenance

The replication package comprises scripts and a dataset, accessible at

Important Notice

As of December 15, 2023, GitHub has discontinued the Copilot for PRs feature, converting the copilot4prs bot to a ghost account. To replicate the dataset, replace 'copilot4prs' with 'ghost' in the CollectCopilot4prs.ipynb notebook.

Environments

We concluded specific installation process in INSTALL.txt

A functional Python environment, compatible with the versions used in the notebooks, with all necessary libraries installed as specified in requirements.txt.
An R installation, preferably of the same version used for script development, with all required packages installed as indicated in requirements_for_R_scripts.txt.
Access to Jupyter Notebooks, either through an Anaconda installation or a direct Python setup.
A computer with sufficient processing power and memory to handle the computational demands of the scripts.
A stable internet connection, especially necessary if scripts involve fetching data from online sources.
An operating system (Windows, MacOS, Linux) compatible with the software and tools used.

Installation and Replication

Please follow the instructions in INSTALL.txt step by step to replicate this study. All necessary data is included in this artifact, allowing you to reproduce all results by running BuildingResults.ipynb without the need to prepare the dataset separately.

Skills

Proficiency in Python programming, including familiarity with data analysis libraries like Pandas, NumPy, and Matplotlib.
Competence in R programming, particularly for statistical analysis, and familiarity with relevant R libraries.
Experience with Jupyter Notebooks, including running and modifying notebook cells and interpreting outputs.
Skills in data analysis and interpretation.
Basic knowledge of version control systems, particularly Git, for accessing code repositories like GitHub.
Ability to troubleshoot common software installation, library dependencies, and environment configuration issues.

Citation BibTeX

@inproceedings{copilotforpr,
  title={Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions},
  author={Xiao, Tao and Hata, Hideaki and Treude, Christoph and Matsumoto, Kenichi},
  booktitle={Proceedings of the ACM on Software Engineering (PACMSE)},
    number={FSE 2024},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Artifact - Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions

Contents

Provenance

Important Notice

Environments

Installation and Replication

Skills

Citation BibTeX

Authors

About

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
FSE_Copilots_For_PRs.pdf		FSE_Copilots_For_PRs.pdf
INSTALL.txt		INSTALL.txt
LICENSE.md		LICENSE.md
README.md		README.md
STATUS.txt		STATUS.txt
requirements.txt		requirements.txt
requirements_for_R_scripts.txt		requirements_for_R_scripts.txt

License

NAIST-SE/CopilotForPRsEarlyAdoption

Folders and files

Latest commit

History

Repository files navigation

Research Artifact - Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions

Contents

Provenance

Important Notice

Environments

Installation and Replication

Skills

Citation BibTeX

Authors

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages