Skip to content

DAS Components

Kenneth Haase edited this page Sep 30, 2021 · 9 revisions

The internal Census Bureau DAS implementation is divided among many different GIT repositories, arranged within the root das_decennial repo as submodules. The source code included in this redistricting production release consists of all the source source code modules used in executing the DAS for that production run (on 2020 and 2010 Census Edited Files). We flattened references to code in different repositories into a single directory tree. This page summarizes the individual components found in this flattened tree, marking those that constitute separate GIT repositories in the current internal DAS repository. [XXXX these sentences didn't make sense. Check my edits]

  • configs contains default configuration files used by the DAS itself, as INI files read by Python
  • das_framwework/ctools (repo) common tools for working with Census data
  • das_framework (repo) contains the general (read/protect/write) framework of the DAS engine
  • programs/nodes implements the classes for the geographic nodes, instances of which represent a specific geographic location at a specific level of the geographic hierarchy, with attributes representing, for example, its privacy-protected measurements and, if available, privacy-protected microdata
  • programs/geographic_spines implements the representation of the geographic hierarchy used by the Top-Down Algorithm (TDA), including optimization of the spine to reduce error in selected off-spine geographies
  • programs/queries implements the DAS Query classes (especially DPQuery) which are the basic units of disclosure avoidance in the DAS
  • programs/optimization implements code for generating microdata (represented as numpy ndarray data structures, interpreted as histograms) with minimum distance to the noisy measurements (using the Gurobi solver)
  • programs/invariants implements the representation of invariants used in the optimization process
  • programs/constraints implements the various constraints applied during the optimization process for different Census products
  • programs/engine implements the Top-Down Algorithm (TDA) used for Decennial disclosure avoidance, coordinating the interplay between the other major DAS "protect"-step processes (e.g., taking of noisy measurements, nodes, optimization, queries)
  • programs/reader reads input files (the CEF files and the geographic files) and transforms them into the geographic tree-of-histogram representation expected by the processing steps implemented in programs/engine
  • programs/writers converts from the Block-level histogram objects generated by engine to microdata-formatted MDF files suitable for downstream consumption for, e.g., Census tabulation
  • programs/python_dvs contains code for the Data Vintaging System which tracks the provenance of source and generated data files

Broadly speaking, the DAS code flows - as organized by das_framework - from reader to engine to writers, with the other subfolders playing supporting roles and defining objects used in one or several of these major steps.