Skip to content

This is a real use case of automating the process of quality control micro economic data to ensure that the data possessing pipeline has not altered the original (raw) data in any unfavorable manner.

License

Notifications You must be signed in to change notification settings

Mohammed-ElDesouky/Automated-Data-Quality-Control-STATA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Automated-Data-Quality-Control-STATA

This is a real use case of automating the process of quality control any micro economic data to ensure that the data possessing pipeline has not altered the raw data in any unfavorable manner. This shall ensure data integrity and reliability. The scripts is capable to examine and test for multiple individual datasets concurrently, and will run the checks and outputs a standard formatted QA report in an .xlsx format.

[Note] This procedure is currently in-use by the World Bank’s Global Education Unite (the Global Education Policy Dashboard team).

What does the procedure exactly include?

The procedure runs checks to report on the following:

  • Changes to # of unique count of observations for each dataset; between raw and processed data
  • Changes to # of duplicates between raw and processed data
  • Changes to # of observations with a missing unique ID between raw and processed data
  • Changes to # of features/variables between raw and processed data

Pre-requisites for error-free implementation

To ensure that the script will run with no error, please ensure the following:

  • Redefine file paths and data directories (at the top of the script) according to your machine and workflow.
  • Redefine the macros (globals/locals) (at the top of the script) according to your data files names and id variables
  • This script uses the "frames" functionality, which was introduced in STATA 16 (Stata-16 or higher is recommended).

About

This is a real use case of automating the process of quality control micro economic data to ensure that the data possessing pipeline has not altered the original (raw) data in any unfavorable manner.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages