Skip to content

Background

haase003 edited this page Sep 9, 2021 · 14 revisions

In 2020, the United States Census Bureau conducted the 2020 Census. Known formally as the Decennial Census of Population and Housing, the census aims to enumerate every person residing in the United States, covering all 50 states, the District of Columbia, and Puerto Rico. All persons alive on April 1, 2020 residing in these places, according to residency criteria finalized in 2018, must be counted.

Following completion of the census, the Census Bureau must submit state population totals to the United States President. The United States Constitution mandates this decennial enumeration be used to determine each state’s Congressional representation.

Public Law 94-171 directs the Census Bureau to provide data to the governors and legislative leadership in each of the 50 states for redistricting purposes. This product is the first file released that includes demographic and housing characteristics about detailed geographic areas.
As part of the Census Bureau’s collection activities, the Census Bureau by statute must assure that the decennial census data products meet the legal requirements of Title 13, Sections 8(b) and 9 of the U.S. Code, which means the published results of the census must not identify data from specific individuals; nor should data from specific individuals be reasonably inferable.

In previous decennial censuses, a variety of techniques were used to protect the confidentiality of responses, including the use of synthetic data and household swapping. For the 2020 Census, the Census Bureau applied the latest science in developing the 2020 Census Disclosure Avoidance System (DAS). Following the instructions of the Data Stewardship Executive Policy Committee (DSEP), the Census Bureau implemented differential privacy (DP) as the Disclosure Avoidance System’s underlying privacy-loss accounting framework for quantifying and mitigating disclosure risk in data products published from the 2020 Census.

This public release of the 2020 Census P.L. 94-171 Redistricting Data Summary File DAS source code provides increased transparency of the Census Bureau’s modernization of its disclosure avoidance methods to protect the confidentiality of individuals’ responses to the 2020 Census.

Overview

Article 1 Section 2 of the U.S. Constitution directs the U.S. Government to conduct an “actual enumeration” of the population every ten years.
In 2020, the Census Bureau conducted the 24th Decennial Census of Population and Housing with reference date April 1, 2020 and has begun producing public-use data products that conform to the requirements of Title 13 of the U.S. Code. The goal of the census is to count everyone once, only once, and in the right place. All residents must be counted. After the data have been collected by the Census Bureau, but before the data are tabulated to produce data products for dissemination, the confidential data must undergo statistical disclosure limitation so that the impact of statistical data releases on the confidentiality of individual census responses can be quantified and controlled.

In the 2010 Census, the trade-off between accuracy and privacy protection was viewed as a technical matter to be determined by disclosure avoidance statisticians. Disclosure avoidance was performed primarily using household-level record swapping and was supported by maintaining the secrecy of key disclosure avoidance parameters.

However, there is a growing recognition in the scientific community that record-level household swapping fails to provide provable confidentiality guarantees. There is also growing concern that it may be possible to reconstruct a significant portion of the confidential data that underlies the census data releases using a so-called database reconstruction attack, as outlined by Dinur and Nissim (2003), and that such reconstructed microdata could be used to successfully re-identify the respondents who provided a significant proportion of the underlying confidential data. Indeed, in 2019 the Census Bureau announced that it had performed a database reconstruction attack using just the publicly available 2010 decennial census publications and had been able to reconstruct microdata that was overwhelmingly consistent with the 2010 confidential microdata.

In order to fulfil its requirements to produce an accurate count and to protect personally identifiable information, the Disclosure Avoidance System (DAS) for the 2020 Census implements mathematically rigorous disclosure avoidance controls based on the set of mathematical techniques known as differential privacy. The DAS will read the Census Edited File (CEF) and apply formally private algorithms to produce a Microdata Detail File (MDF). The DAS can be thought of as a filter that allows some aspects of data to pass while controlling the leakage of confidential data to the level permitted by the differential privacy parameters.

By policy, all data that are publicly released by the U.S. Census Bureau based on the 2020 Census must go through some form of mathematically defensible formal privacy mechanism.

DAS Design Decisions

Many of the principal features, requirements, and parameters of the Census Bureau’s implementation of the DAS for production of the 2020 Census P.L. 94-171 Redistricting Data Summary File were policy decisions made by the Census Bureau’s Data Stewardship Executive Policy Committee (DSEP). These policy decisions impacting DAS design include: the list of invariants (those data elements to which no noise is added); the overall privacy-loss budget; and the allocation of the privacy-loss budget across geographic levels and across queries. While DSEP is responsible for significant decisions, actions, and accomplishments of the 2020 Census Program, the Associate Director for Decennial Programs publicly documents these policies in the 2020 Census Decision Memorandum Series for the purpose of informing stakeholders, coordinating interdivisional efforts, and documenting important historical changes. This memorandum series is available at 2020 Census Memorandum Series.