-
Notifications
You must be signed in to change notification settings - Fork 3
Home
This wiki contains technical documentation for the Census Bureau's Disclosure Avoidance System (DAS) used in protecting confidential information provided by respondents during the 2020 Decennial Census of Population and Housing. This particular release consists of the code and parameters used to generate the privacy-protected Microdata Detail File (using zero-Concentrated Differential Privacy via an exact sampler for the Discrete Gaussian Mechanism for its primary confidentiality protection primitives), which was in turn used to produce the Public Law 94-171 Redistricting Data Summary Files. These data contain tabulations used, notably, for redistricting and official population counts.
This documentation applies to the particular version of the DAS source code and parameters used to protect the tabulations released according to the P.L. 94-171 publication of 2020 Census results. Additional supporting documentation will be added as it becomes available.
-
Background describes the purpose of the 2020 Census and motivates the need for disclosure avoidance in published tabulations. It also summarizes the history of disclosure avoidance methods in past censuses and the underlying design decisions for the 2020 DAS.
-
DAS Implementation describes the implementation of the 2020 DAS and in particular of the TopDown Algorithm (TDA) used to produce privacy-protected block-level microdata used for generating the redistricting tabulations.
- Implementation Details goes into more in-depth description of the individual phases of the TDA,
-
DAS Components provides a guide to the source code and modules constituting the DAS. The published source code collapses different submodules into a single directory tree.
- The Query Object describes a key component of the DAS software architecture.
-
DAS Infrastructure describes the computational infrastructure used for processing the 331 million individuals and 156 million living quarters (housing units and occupied group quarters) enumerated by the 2020 Census. The DAS ran in a secure cloud environment and took advantage of a high performance computing platform called "Elastic Map Reduce" (EMR).
- EMR Configuration describes the configuration of individual nodes within the EMR clusters used for the DAS.
-
Authors lists some key contributors to the creation and application of the Decennial DAS whose contributions are gratefully acknowledge.
Here are scientific papers about the semantics, mathematics, and design of the DAS system.
- Census TopDown Algorithm: Differentially Private Data, Incremental Schemas, and Consistency with Public Knowledge under review at Harvard Data Science Review.
- Geographic Spines in the 2020 Census Disclosure Avoidance System TopDown Algorithm.
- An Uncertainty Principle is a Price of Privacy-Preserving Microdata forthcoming NeurIPS 2021 Conference, in final preparation.
- Privacy Semantics of the Disclosure Avoidance System for the 2020 Census in peer review.
As papers become available and new relevant papers are added, this
page will be updated and electronic copies will be placed in the
docs
subdirectory of this repository. To receive regular updates on
new papers, releases, and results, subscribe to our
newsletter.