Skip to content

A repository for the investigation of cybersecurity static-analysis tool evolution

License

Notifications You must be signed in to change notification settings

MSUSEL/tool-evolution

Repository files navigation

Introduction

This data pipeline contains all the code needed to recreate the analyses and plots contained in our manuscript entitled "New Version, New Answer: Investigating Cybersecurity Static-Analysis Tool Findings". Each of the folders here is intended to be run in sequence.

With the exception of the data acquisition portion (under directory '01_acquisition') of this project, all code is written for R v4.2.2; we used Python v3.6 for data acquisition. Regenerating the data acquisition portions has the following system requirements. cve-bin-tool runs require pip to be installed, and cwe-checker runs require docker to be installed. Assuming a user has downloaded R and the required packages used in the pipeline, the entire data analysis portion of the pipeline can be regenerated.

Each folder in the main directory (01_acquisition, 02_wrangling_detailed, 03_wrangling_aggregated, and 04_analysis) contain 4 folders with the same names: 01_input, 02_protocol, 03_incremental, and 04_product. With the exception of the 01_acquisition folder, the 01_input folder has all of the data needed to execute the protocol in the 02_protocol folder. The 03_incremental folders hold information that was informative, necessary, or both but not essential for generating a data product; many of these folders are empty. Each 04_product folder contains the data product for each step in the pipeline. The data in the 04_product folder from the first directory is copied into the 01_acquisition folder of the subsequent directory, and so forth.

Assuming that an end user has created the proper directory structure, they can re-run the entire wrangling and analysis portion of this work by executing the following sequence of commands from within R: source("./tool-evolution/02_wrangling_detailed/02_protocol/protocol.R") source("./tool-evolution/03_wrangling_aggregated/02_protocol/protocol.R") source("./tool-evolution/04_analysis/02_protocol/protocol_aggregated.R") source("./tool-evolution/04_analysis/02_protocol/protocol_detailed.R")

Pipeline folders

01_acquisition

This folder contains the scripts needed to acquire the data we analyzed. To re-run this portion of the pipeline, one will need to download the static-analysis tools cwe-checker and cve-bin-tool. We ran cwe-checker within Docker containers; describing the setup for those is beyond the scope of this pipeline. However, the script executing and writing the output for cve-bin-tool can be run from the root directory directly.

02_wrangling_detailed

This folder contains the input data, scripts, and data products needed to wrangle, clean, and create some basic summary statistics about the outputs generated by cwe-cheker and cve-bin-tool. This folder has "detailed" in the name because these data are aggreggated at the level of the individual binaries and the CWEs and CVEs, respectively. The output from this step is stored within the 04_product subdirectory as an .RData file.

03_wrangling_aggregated

This folder contains the input data (the products from the detailed wrangling folder) and the protocol which aggregated the detailed findings into a summative score across all binaries analyzed. The output from this step is stored within the 04_product subdirectory as an .RData file. Some basic summary statistics are also calculated in the protocol.

04_analysis

This folder receives the product data from 02_wrangling_detailed and 03_wrangling_aggregated as its input data. One additional file is included in these inputs; this file is a csv with attribute data compiled for the majority of the binaries analyzed herein; the creator of this file is Andrew Johnson, a former MS student in the lab. His thesis contains the source for this data and is cited in our paper. The input data are analyzed and plots are generated. All plots included in our manuscript can be found in the product directory.

Funding Agency:

About

A repository for the investigation of cybersecurity static-analysis tool evolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •