-
Notifications
You must be signed in to change notification settings - Fork 33
User_Guide
- 1. Getting Started with CEDS.
- 2. System Overview
- 3. Input Data
- 4. CEDS System Code
- 5. How to Include Supplemental Combustion Energy Activity in CEDS
-
6. Code Structure and Guide
- 6.1. Module A (Activity Data)
- 6.2. Module B (Combustion Emission Factors)
- 6.3. Module C (Non-combustion Emissions)
- 6.4. Module D (Initialize Emission Database)
- 6.5. Module E (Pre-process Emission Inventory Data)
- 6.6. Module F (Inventory Scaling)
- 6.7. Module G (Gridding)
- 6.8. Module H (Historical Extension)
- 6.9. Module S (Summary Data Processing)
- 6.10. The Makefile
- 6.11. The Parameters Folder
- 6.12. Diagnostic Folders
- 7. Diagnostic Results
- 8. Troubleshooting Odd Results
CEDS is an open source framework that aims to be run on any major operating system. The following section offers instructions for installing CEDS on your local machine.
R packages continually change, which can break existing R codes either directly, or indirectly through package dependencies. The CEDS system now uses the renv package to better assure that the system will be functional as R packages evolve.
renv
allows users to create isolated, reproducible, project-specific R libraries. It is cross-platform and allows for the installation of older package versions.
Upon cloning the repository and navigating to the root CEDS directory, users should activate their renv
library and install CEDS R dependencies.
Although the renv
setup files are in the CEDS repository, it is still necesarry initialize the project. This is done with the renv::init()
function.
To initialize a renv
library for CEDS, open an R session in your CEDS root directory. Then:
-
Install the
renv
package:install.packages("renv")
-
Initialize the project library:
renv::init(bare=TRUE)
-
From an R session in your CEDS project root directory, run the following command:
renv::restore()
.
By default, the init
function will scan the project’s source code for R dependency packages to download, but this can take a while to run and won’t necessarily install the package versions CEDS needs to run. Using the bare = TRUE
argument will tell renv
to install an empty R library for CEDS that we can populate with packages defined in the lockfile.
`renv::init()` calls renv::activate()
, which writes the infrastructure needed to ensure that R will load the CEDS R library on launch, among other things.
By default renv::restore()`will retrieve the library package metadata from `renv.lock
and install, if necessary, the specified package versions to the project’s private library located in CEDS/renv/library/…/…
.
renv
is extremely lightweight and only adds four files the the repo with a total size of 32 KB. It also automatically adds its installed package directories to the project’s .gitignore
file, so users don’t have to worry about accidentally committing hundreds of MB of R packages to the repo. An example of the renv
file structure as it would appear in the CEDS repo is below:
CEDS/
|- renv/
| |- .gitignore
| |- activate.R
| |- settings.dcf
|
|- renv.lock
library/
sub-directory would be added to renv/
, which is where R packages would be installed.
renv
utilizes lockfiles to record the state of a project’s library at some point in time. They contain package metadata, such as package names, versions, and sources, as well as the R version that was used to initialize the project. While normally generated with the snapshot()
and restore()
functions, lockfiles are written as .json
which allows them to be edited by hand. The CEDS lockfile, renv.lock
, is located in the root CEDS project directory.
A defining feature of renv
is the use of a global package cache, which is shared across all projects using renv
on a machine. The cache saves time and disk space by allowing various projects to access the same packages, rather than installing the same packages and versions into separate projects.
When using the global package cache, the project library is formed as a directory of symlinks rather than a directory of installed R packages. Each renv
project is isolated from other projects on a machine, but they can still re-use the same installed packages as needed.
The global package cache is enabled by default, however it can be disabled by setting renv::settings$use.cache(FALSE)
. This will ensure that packages are then installed to project libraries directly, without attempting to link to the renv
cache.
The installation of the farcer
package may fail when attempting to compile, resulting in an error message that looks something like this:
* installing *source* package 'farver' ...
** package 'farver' successfully unpacked and MD5 sums checked
** libs
g++ -std=gnu++0x -I"/share/apps/R/3.5.1/lib64/R/include" -DNDEBUG -I/usr/local/include -fpic -I/share/apps/R/3.5.1/include -c ColorSpace.cpp -o ColorSpace.o
In file included from ColorSpace.cpp:1:
ColorSpace.h:19: error: ISO C++ forbids initialization of member 'valid'
ColorSpace.h:19: error: making 'valid' static
ColorSpace.h:19: error: ISO C++ forbids in-class initialization of non-const static member 'valid'
make: *** [ColorSpace.o] Error 1
ERROR: compilation failed for package 'farver'
In this case this due to the package’s C++ backend using features not present in the older (gcc 4.4.7
) default gcc
complier on PNNL’s internal HPC pic
system.
Load a newer version (6.1.0 works as of 13 May 2020) of the gcc
compiler via module load gcc/6.1.0
.
CEDS uses the ncdf4
package within the gridding module to produce gridded emissions files. The package is not required to produce CEDS emissions CSV files.
ncdf4
depends on an nc-config
file that ships with the Unidata NetCDF library. The Unidata NetCDF library is a documented system requirement for the R ncdf4 package. The NetCDF C library is installed on pic, but is not loaded as a module at the beginning of a remote session. Attempting to install the R ncdf4
package without the netcdf
module loaded into your session can result in the following error:
Installing ncdf4 [1.16] ...
FAILED
Error installing package 'ncdf4':
=================================
* installing *source* package 'ncdf4' ...
** package 'ncdf4' successfully unpacked and MD5 sums checked
configure.ac: starting
checking for nc-config... no
-----------------------------------------------------------------------------------
Error, nc-config not found or not executable. This is a script that comes with the
netcdf library, version 4.1-beta2 or later, and must be present for configuration
to succeed.
Load the netcdf
library into your session via the command module load netcdf
.
An alternative solution can be to install a more recent binary version of the netcdf R library.
ICU is a cross-platform Unicode based globalization library. It includes support for locale-sensitive string comparison, date/time/number/currency/message formatting, text boundary detection, character set conversion and so on.
When attempting the install some R packages, such as stringi v1.2.2
, the ICU4C
library is unable to be located and the installation fails:
checking for pkg-config... /usr/bin/pkg-config
checking with pkg-config for the system ICU4C... no
*** pkg-config did not detect ICU4C-devel libraries installed
*** Trying with "standard" fallback flags
checking whether we may build an ICU4C-based project... no
*** The available ICU4C cannot be used
checking whether we may compile src/icu61/common/putil.cpp... no
checking whether we may compile src/icu61/common/putil.cpp with -D_XPG6... no
*** The ICU4C bundle could not be build. Upgrade your compiler flags.
ERROR: configuration failed for package 'stringi'
* removing '/pic/projects/GCAM/mnichol/ceds/CEDS-dev/renv/staging/1/stringi'
Error: install of package 'stringi' failed
module load gcc/7.3.0
(gcc/7.3.0
works as of 15 May 2020).
Use the install.packages
function to modify the compiler flags used in the installation process:
install.packages(c("stringi"),configure.args=c("--disable-cxx11"), lib=lib)
lib
argument to install the package into your project’s renv
library (can be found using .libPaths()
).
NOTE This solution is fine for only installing stringi
, however it may not completely resolve the problem when stringi
is being installed as a dependency for another R package through renv
.
renv
has the ability to link packages from a user’s global R library to their project-specific renv
library, saving the time and space that re-downloading the same package and version a second time. However, once this cache link is established, removing the package from the global R library will break the link, causing errors in the renv
library.
his error message below resulted from the cache link between the CEDS renv
library and global R library being broken for the stringi
package when attempting to load the stringr
package, which depends on stringi
:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/qfs/people/nich980/.local/share/renv/cache/v5/R-3.3/x86_64-pc-linux-gnu/stringi/1.2.2/e99d8d656980d2dd416a962ae55aec90/stringi/libs/stringi.so':
/usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /qfs/people/nich980/.local/share/renv/cache/v5/R-3.3/x86_64-pc-linux-gnu/stringi/1.2.2/e99d8d656980d2dd416a962ae55aec90/stringi/libs/stringi.so)
Couldn't load 'stringr'. Please Install.
Manually install the package dependency in question into the project renv
library, using the library
and rebuild
arguments:
> .libPaths()
[1] "/pic/projects/GCAM/mnichol/ceds/CEDS-dev/renv/library/R-3.3/x86_64-pc-linux-gnu"
[2] "/tmp/RtmpofmJy0/renv-system-library"
> lib <- .libPaths()[1] # CEDS renv library
> renv::install("[email protected]", library=lib, rebuild=TRUE)
renv
to install the package in the local library, rather than attempting to create another cache link.
Note that the Rscript command needs to in the command path for your system for the CEDS makefile scripts to run. In some installations you may have have to add the location of Rscript to your environment’s PATH variable.
CEDS also requires several R packages to be installed. Once the renv
system is setup and initiated as described in the previous section, the packages needed for CEDS should be available.
The current list of necessary packages an be found in the:
./code/parameters/global_settings.R
file along with a set of package version numbers that have been tested to work.
As noted in the previous section, we have observed issues when installing the netCDF package from source. The netCDF package is only necessary if you are producing gridded data. You are having trouble with installation and are not producing gridded data, you can remove that package from ./code/parameters/global_settings.R
and also the renv
lockfile.
In order to be able to run the CEDS system as a whole, it is necessary to acquire a copy of the IEA energy statistics data files. In the current CEDS version these are in one file called OECD_and_NonOECD_E_Stat.csv
that should be placed in the emissions-data-system/input/energy directory. They are required by the script A1.2.IEA_downscale_ctry.R
, but are proprietary data and not allowed to be a distributed as part of a public-domain system such as CEDS. More details about using IEA data can be found below in Energy Data section.
While individual components of CEDS can be run individually with R, the system as a whole should be executed using a Makefile system. Commands of the form:
make [em]-emissions
(where [em] is replaced the desired emission species) will produce emissions by country, sector, and fuel.
Once the aggregate country-level emissions are produced, they can be mapped to spatial grids using:
make [em]-gridded
Note that before gridding, spatial proxy data must be installed as described in the module-G section below.
To run CEDS with the Makefile, you will need to install Make
.
To test if you have Make
already installed, simply type make
in the command line. If you do not have it, you will see the error:
bash: make: command not found.
Make
can easily be installed with Xcode, which can be downloaded for free from the Apple App Store.
Whithin Xcode, you can install Command Line Tools by selecting Xcode→Preferences→Downloads
, then clicking Components
and Install
on the command line tools line.
Tip
|
Make can also be installed with the HomeBrew command brew install make .
|
Once Make
is installed, Makefiles can be run by opening the command prompt, pointing to the location of the Makefile, and entering make
(or nmake
).
The commands Rscript
or R CMD BATCH
are necessary, as they are used in the Makefile. These commands can also be used in the command line independently to run specified individual scripts, if desired.
There are a number of options for running a Makefile in Windows. The make
functionality is not native to the Windows operating system, so it must be downloaded. Some options for installing Make are:
During the installation of Cygwin make sure to specify that you would like Make as well. On the Select Packages screen, under All→Devel
, ensure that the Bin box is checked for the file labeled make: the GNU version of the ‘make’ utility
. Src is not necessary.
Tip
|
Installing Cygwin also gives the option to install other command line tools as well such as R commands like Rscript and R CMD BATCH, gcc functionalities, or command line text editors. |
You will need have to tell Cygwin where the “make” and R are located (in addition to them beginning specified in the system environment variables) as follows:
-
cygstart .bash_profile
-
open the .bash_profile file in notepad
-
Add the paths to the “make" and R commands to the .bash_profile file
export PATH=$PATH:/cygdrive/c/cygwin64/bin export PATH=$PATH:/cygdrive/c/Program\ Files/R/R-3.5.3/bin
This option will provide the make
command to the Windows command prompt
This option will provide the nmake
command to the Windows command prompt
Use git
to download CEDS to a local repository on your system. If not already available on your system you will need to install either a command-line version or a GUI interface such as sourcetree
. After making sure all prerequisites are properly installed, run the entire system by simply navigating to the CEDS folder and executing the make all
command. The Makefile system will detect any changes made and re-build the outputs as necessary. If the system is up to date, it will do nothing.
All modules are included in a Makefile in the emissions data system. Running the modules through the Makefile is advantageous because make
will automatically run only what needs to be run to keep everything properly updated.
Caution
|
Make sure you are in the root CEDS directory containing the Makefile . If you are not you will see the error: No targets specified and no makefile found. Stop.
|
To rerun the entire system use the make clean-all
command, then the make
command. This will remove all intermediate outputs and log files, forcing the Makefile system to build the system from the first output file again and running all integrated scripts. If you have made changes to the data processing or input data the clean-all
command is important to assure accurate processing. For more information on using the Makefile, see the makefile section of the User Guide.
CEDS is set-up to run in parallel by species. Example shell scripts are in the exe/PIC-job-scripts
directory that can be modified as needed for your system. Note that module A
, activity, must be run first, then subsequent species-specific modules can then run in parallel. Note that the make
file also contains commands for running all species in three parts (part1, part2, part3
) which can be done manually on any system with multiple processors.
It is also possible to run individual scripts in CEDS without running the whole system. Simply use the RScript
command from the command line, or open the file you wish to run in an R GUI and run from there. Make sure you are running the script from either the root directory of the system (the CEDS directory, by default), or the input directory. But a ` make clean-all` (or a 'make clean' for the relevant species or module) is highly recommended after a code change has been made.
Note
|
make clean-all followed by make is the only supported method to assure the system will produce accurate results.
|
The CEDS system estimates emissions by sector, country and fuel in a few major steps (see Hoesly et al. Figure 1). First a set of default emissions are estimated for the modern era (either 1960 or 1971 to the last estimation year). These default emissions are then scaled to country inventories. Finally these are extended back to 1750. A general outline of the calculation process is given in this section.
CEDS has two categories of emissions, which reflects they way they are calculated: combustion emissions and process emissions.
This assignment is done at the sector level, so each CEDS sector is designated as either a combustion or a process sector in the Master_Fuel_Sector_List
.
In the CEDS system process emission sectors have emission estimates for each CEDS fuel (hard_coal
,diesel
,biomass
, etc.). In most CEDS intermediate files, therefore, combustion emission sectors have a row for each CEDS fuel (per country/iso). Process emission sectors are assigned a fuel of "process" to differentiate them from combustion emissions. There is only one row in intermediate files for each process emissions sector.
For combustion sectors, default emissions are always calculated using fuel consumption and emission factors. Default emissions are determined as:
Default_Emissions = Fuel_Consumed • Emission_Factor • (1 - Control_Fraction)
For SO2 there are additional parameters:
Default_Emissions = Fuel_Consumed • Sulfur_Content • 2 • (1 - Ash_retention) • (1 - Control_Fraction)
The default emission factors used above are either from GAINS or regional and country/sector specific values from a variety of sources. The fuel consumed and other activity data is generated in Module A (Activity Data) and the default combustion emission factors in Module B (Combustion Emission Factors).
Default emissions for CEDS process emission sectors are taken from EDGAR, country inventories, or other data sources for some specific sectors in Module C (Non-combustion Emissions).
Note that the term "process emissions" refers to the way in which the default emissions are calculated, and that some sectors classified as "process" may include emissions that result from fuel combustion. For example, flaring from oil and gas operations is a combustion process, but default emissions are specified by taking emissions from some default source and not by multiplying an emission factor times driver data.
The second major step is scaling default emissions to country level inventories in Module F (Inventory Scaling).
The last step in producing emissions is extension of emissions back to 1750 in Module H (Historical Extension). Emissions for any specific sector/emission species can be extended back using a variety of user-specified methods including exogenous trends in some proxy or emissions data, per-capita trends, or trends in an emission factor (including emission factors trending to zero at a specified year.
The user can elect to produced spatial grids of the emissions generated in the previous steps in [Code-Module-G].
To extend the system to run to a later year, the key input data file to change is the BP energy statistics. Overwrite the current file with a more recent version. The file is located in /input/energy/. Note that this file needs to be in .xlsx format.
Then update the parameters BP_years and end_year in the file: /code/parameters/common_data.r
Clean and re-run make. The emissions data should now extend to the latest year specified. The data are simply extrapolated, updating emission inventory data (and detailed IEA energy data) will produce a more accurate estimate for recent years.
You may need to update BP mapping files if country names in the BP data have changed.
Note that the BP data must extend to the latest year specified here. The BP energy statistics only provide consumption by total fuel for the larger countries. More accurate results will be obtained by also updating to the latest version of the IEA energy statistics as described below.
Note that just updating the year will simply run the system to a later year using default emission factor pathways. In order to obtain a more accurate result, inventory data should also be updated. Please contact us as noted the gitHub Readme if you are interested in collaborating in updating the CEDS system.
-
There are numerous input files that provide data that extend to the last data year. Some of these will need to be updated. Look for user data files in
input/default-emissions-data
,input/extension
,input/energy/energy-data-adjustment
andinput/energy/user-defined-energy
directories. -
"default" trend extension instructions in
CEDS_historical_extension_methods_EF.csv
should generally be set to the last CEDS year. -
Check if any minimum EF pathways in
input/extension/EF-pathway/
need to be updated to lower values for later years.
CEDS has two types of sectors (set in the file Master_Fuel_Sector_List.xlsx):
-
combustion sectors: Emissions from these sectors use energy data by fuel and sector as driver data. Default emissions are calculated by multiplying an emission factor times fuel consumption (minus an optional control fraction).
-
non-combustion sectors: Emissions from these sectors use some other data (default is population) as driver data. (Also referred to in CEDS documents as process emissions.) Default emissions are read-in from an external inventory source, user data, or a sector-specific script. Note that, physically, emissions from a CEDS non-combustion sector may be from fuel combustion. This designation refers only to how emissions are calculated within the CEDS system.
Adding a process (non-combustion) emission sector
In addition to indicating your data’s sector in your data source (the U.* file you used to import the data), you will need to edit 2 files in CEDS. They are:
-
CEDS/input/mappings/Master_Sector_Level_map.csv
-
Add a new row to the spreadsheet where appropriate. The row will contain five columns of data:
-
The detailed sector name: a unique sector ID (one word)
-
working sectors v1 and
-
working sector v2: these can be either your detailed name or a first-level aggregation; I think they may not be used in the model itself but are process documentation
-
The aggregate sector: if appropriate, the aggregate sector name will be identical to an existing aggregate sector
-
Figure_sector: this should be identical to an existing Figure_sector: this is the category in which your data will be displayed in CEDS graphical outputs
-
-
-
CEDS/input/mappings/Master_Fuel_Sector_List.xlsx
-
Add a new row to the spreadsheet at the appropriate location in the “Sectors” sheet only. This row will contain 4 columns of data:
-
The working_sectors_v1 sector name
-
The activity type
-
Units of analysis
-
Type: comb (combustion) or NC (non-combustion)
-
-
There are two input files that are used by CEDS but must be re-generated by manually running specific scripts (due to dependences on multiple species). It should not be necessary to do this often, but these should be revisited periodically, and always before a final data release. These are:name: value
File Name |
Script |
… |
input/default-emissions-data/CD.OC_to_PM25_defaultratio.csv input/default-emissions-data/CD.BC_to_PM25_defaultratio.csv |
D1.2.BC_OC_to_PM2.5_default_ratios.R |
… |
input/extension/extension-data/H.N2O_7BC_extension-NH3_and_NOx_sectors_1_2.csv |
H1.1a.Aggregate_NH3_NOx_for_N2O_7BC_ext.R |
… |
The core data needed to run the data system is the IEA OECD and non-OECD energy statistics.
The IEA energy statistics database needs to be purchased from the IEA and the data exported into csv format in order to run the CEDS system. The instructions below refer to the cd-rom distribution: the entire IEA energy database needs to be exported for use in the data system.
Steps to import the IEA energy data
-
Export the statistics for OECD and non-OECD countries into two .csv files
-
The first column is full name (spelled out).
-
The second column is flow (as IEA abbreviation, because names are not unique otherwise. To change to abbreviation, click on the flow icon, then go to Dimensions → Change label).
-
The third column is fuel (spelled out).
-
The necessary format is shown in the files: OECD_E_Stat_Template.csv
and NonOECD_E_Stat_template.csv
.
(To export from the IEA beyond 20/20 data browser, drag the icon for country to the left to form a column and icon for time to the right to create a row with years. Then drag the icon for flows between the column for countries and data for the first year; it will add a column for flows. Then drag the icon for fuel between column for flows and data for the first year. This will result in a large table that contains all the data that can then be exported as a csv.)
-
In a text editor:
-
Replace .., c, and x, in the data values with zeros (note these can occur at end of lines)
-
Get rid of special characters and apostrophe’s
-
Côte d’Ivoire → Cote dIvoire
-
Dem. People’s Rep. of Korea → Dem. Peoples Rep. of Korea
-
People’s Republic of China → Peoples Republic of China
-
Curaçao → Curacao.
-
-
If the data is the same release used in the version of the CEDS system that you have (you can check this in the metadata file that is released with the system) then there are no further steps.
However, if you are using a newer (or older) version of the IEA/OECD statistics, then the following additional steps are needed.
-
Update year ranges in code\parameters\common_data.R. To replace the IEA data from 2012 edition to 2015 edition, change the parameter IEA_years ← 1960:2010 to IEA_years ← 1960:2013.
The BP energy statistics are used to extend energy consumption and production data to the latest CEDS year. If you use the IEA data from 2015 edition, change the parameter BP_years ← 2011:2014 to BP_years ← 2014.
-
If there are new countries or new country names - the master country list will need to be updated input\mappings\Master_Country_List.csv.
-
If there are any new fuels these might need to be updated in the master fuel list input\mappings\energy\IEA_product_fuel.csv.
-
If fuels have changed names, this might require changes in other files. Please contact us for assistance. (We will be working to generalize this process.)
Tip
|
When updating the IEA energy data check that the data in input/energy/energy-data-adjustment is still valid and update if necessary. |
Similarly, the system also uses net energy content values from the IEA. These also must be exported, with the format provided in the files: NonOECD_Conversion_Factors_Full_template.csv
, NonOECD_Conversion_Factors_template.csv
, OECD_Conversion_Factors_Full_template.csv
, and OECD_Conversion_Factors_template.csv
.
In order to more accurately extend process emissions time series, driver data for the appropriate emissions time series is needed.
In the first phase of this project, where we are focusing on recent decades, complete, consistent time series estimates exist for most emissions (e.g. EDGAR, FAO, etc.). For this reason, process emissions driver data are not critical to this first phase and most of this data has not been incorporated.
The User can add process (non combustion) emissions to CEDS by adding inventory files or instructions for using processed inventory files (from module E) in the intermediate_output folder.
CSV files with process emissions data may be added to input/default-emissions-data/non-combustion-emissions folder. Files should be named with "U.<em>_" followed by a description or identifier. The system will not import files named without the .<em> (example "U.SO2"). Clean commands (executed by the make file) will delete files in the folder with "C.", so users should only add files proceeded by a "U.".
Files should be in standard CEDS format with column headings iso-sector-fuel-units-Xyears similar to output emissions and EF files produced by the system. Year columns must be in the format “Xyear” such as X1980 or X2005. Files may contain any number of emission years in any order. Script will automatically order years and linearly interpolate between years. This script does not extend emissions to other years outside given data.
Files must contain iso-sector-fuel-units. Entries that are not exact matches for those 4 id columns to entries in CEDS NC_database will not be added. The script automatically filters out entries which are not mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx) or have “process” as fuel.
Data lines from processed inventory files (from module E) in the intermediate_output folder can be added to the default dataset by adding lines to input/default-emissions-data/non-combustion-emissions/add_inventory_instructions.csv
. This is particularly useful (and recommended) when the default process emissions data is too different from the inventory data, resulting in large scaling factors. (This can be diagnosed by examining scaling script diagnostic files.)
Data specified must be inv - the name of the inventory file in the intermediate-output file such as E.SO2_EMEP_NFR09_inventory em - the emission species iso - country code inv_sector - exact match of the name of the inventory sector specified in the inventory file (inv) ceds_sector - the CEDS sector the emissions should be matched too
Data must be mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx).
-
Note when using GREP to select input files, that one cannot grep for "OC", for example, as this will also capture "NMVOC". You must use an appropriate wildcard match that distinguishes between "NMVOC" and "OC", and "CO" vs "CO2".
-
If you encounter an error where a package is reported to not be available even though you installed is already, try installing without specifying a lib argument (e.g.,
install.packages( 'package-name' ) )
so that the package is installed in the default location. (Note that GUI’s such as RStudio might sometimes install a package in the wrong place.) -
When continually running code from individual R scripts, using the function
logStart()
(called in theinitialize
function at the beginning of every script) withoutlogStop()
(called at the end of every script) will keep the log files open. An R session can only handle so many open log files before the following error occurs:Error in sink(paste(logpath, fn, ".log", sep = ""), split = T) : sink stack is full
To resolve, clear the global environment manually or by restarting the R session.
-
Similar to the error above, having too many files open can create the following error:
Error in textConnection("rval", "w", local = TRUE) : all connections are in use
To resolve, enter the command
closeAllConnections()
into the console.
CEDS has the capacity to dynamically include user-defined activity in a number of ways. This section outlines how to include supplemental combustion activity data in a run of CEDS.
Every supplemental dataset is required to be in a .csv format and must be accompanied by a corresponding instructions file. Additionally, a mapping (.xlsx) file is required for any dataset that is not already in the standard CEDS format.
These files are tied together by their root filename, with the non-data files
specified by an extension of -instructions.csv
or -mapping.xlsx
. All files
must be saved to the folder input/extension/user-defined-energy
in order to be
included. For example, your extension
directory might look like this:
input/
├── extension/
│ ├── user-defined-energy/
│ │ ├── mydata.csv
│ │ ├── mydata-instructions.csv
│ │ ├── mydata-mapping.xlsx
│ │ ├── USA_historical_coal.csv
│ │ └── USA_historical_coal-instructions.csv
... ...
If the files are formatted correctly, they need only be placed in this folder, and CEDS will automatically identify and process the data.
Below is a detailed guide to creating and formatting these files.
The data file is expected in wide form. There must be exactly one column giving
information on the country, and at least one column giving the fuel type (agg_fuel, CEDS_fuel, or both). Additionally, one or two columns are allowed for specifying sector depending on
the level of specificity. The activity data itself should have year or
Xyear headers (e.g. 1950
, 1951
or X1950
, X1951
).
A dataframe in CEDS format with all allowed columns might look like this:
iso |
agg_fuel |
CEDS_fuel |
agg_sector |
CEDS_sector |
X1970 |
… |
deu |
coal |
coal_coke |
1A1_Energy-transformation |
1A1a_Electricity-public |
1150.79 |
… |
Since CEDS operates under the principle of preserving raw input data when possible, the input dataset does not need to be neatly named to CEDS sectors and fuels. The purpose of the mapping file is so the system can identify how input data corresponds to CEDS data.
There should be one sheet in this Excel file for each ID column in the input data, and the sheet names must be the name of the resulting CEDS column. If a data ID column is already in CEDS form, no mapping sheet is needed. There are five possible sheet names:
-
CEDS_sector
-
CEDS_fuel
-
agg_sector
-
agg_fuel
-
iso
Any mapping file may include any or all of these, as needed. Other sheets will not be identified.
Each sheet should contain two columns, one headed by the name of the column (same as the sheet name) and the other bearing the header corresponding to the header in the data frame. The data in the columns are the equivalent IDs.
The following is an example of what a mapping sheet titled "CEDS_sector" might look like:
my_sector_name |
CEDS_sector |
public_electric |
1A1a_Electricity-public |
auto_electric |
1A1a_Electricity-autoproducer |
heat_production |
1A1a_Heat-production |
The raw data corresponding to this example could look something like this:
iso |
my_sector_name |
agg_fuel |
X1970 |
… |
usa |
public_electric |
oil |
16.21 |
… |
usa |
auto_electric |
oil |
105.5 |
… |
usa |
heat_production |
oil |
124.8 |
… |
In the case that your data cannot be easily mapped, you can make use of the
parameter preprocessing_script
described in section 3.2 below. If no mapping
file is included, it is assumed the data is already correctly mapped. Alternatively, you
can also utilize an additional sector mapping file if you wish to retain aggregate sectors
that do not correspond to CEDS default aggregate sectors (see "Master_Sector_Level_map.csv"
in the input/mappings directory). If so, please see the below subsection "Alternative Mapping File".
If you wish to utilize aggregate sectoral data which does not correspond to CEDS default aggregate sectors (see "Master_Sector_Level_map.csv" in the "input/mappings directory"), then you must provide an additional sector mapping file. This file name must be listed within the corresponding instructions file as an entry under the column heading "sector_map" for each row within the instructions file which will use this map. This mapping file needs to be placed in the "CEDS\input\extension\user-defined-energy" directory and must have a column named "CEDS_sector" with the corresponding CEDS_sectors you wish to map to your aggregate sectors (often listed under a column headed by the label "sector"). Note that your aggregate sector column in the mapping file must have the same name as the column header for these sectors in your user data, while in the instructions file the column header will be "agg_sector". Note that your mapping file must be a .csv named with the following convention: [filename]_sector_map.csv.
The instructions file is the place to define any parameters for how specifically to process the input dataset. This file is used to determine both which data to bring into the system from your dataset, and how it should be integrated into the default data.
The instructions file must have exactly one column giving information on the country, and at least one column giving the fuel type (agg_fuel, CEDS_fuel, or both). The instructions file should have a row for each combination of data in the corresponding data file:
iso |
CEDS_fuel |
CEDS_sector |
start_year |
end_year |
options… |
deu |
coal_coke |
1A1a_Electricity-public |
1931 |
1934 |
… |
deu |
hard_coal |
1A1a_Electricity-public |
1932 |
1936 |
… |
deu |
brown_coal |
1A1a_Electricity-public |
1931 |
1936 |
… |
deu |
coal_coke |
1A1a_Electricity-autoproducer |
1931 |
1936 |
… |
This example shows all of the necessary columns for reading in data with CEDS_fuel
and CEDS_sector specificity. To include all isos, provide all
within the column of the instructions file. To include all sectors, simply leave that column out of the instructions file, or alternatively provide the sector name all
.
CEDS provides several options (listed in Section 3.2 below) for specifying how to integrate the supplemental data into the default data.
Tip
|
These instructions must be in CEDS ID form because they specify how the system will use the data once mapped—they correspond directly to components of the CEDS activity data. |
There are several use instructions that can be specified by the user. If a given option is not included, it will be set to the default. These options can be set for each row of the instructions file for a dataset by including a column with the option as the header (case-sensitive).
-
priority
is a tool for manually specifying the order in which datasets are included in the system (see Default Order in Notes). Priority is given as integers; data with priority 1 will be dominant over priority 2, which will be dominant over data with no priority specified. Defaults toNA
. -
keep_total_cols
is an argument that MUST be specified in a user’s instruction file. The value for this argument needs to be one of the 6 options listed below (an error message will be provided if keep_total_cols is not defined as one of these 6 options). Note that if the user provides a fuel and sector level in their user data which matches the option provided in their corresponding instructions file for keep_total_cols, the data will not be normalized (for example, if the user provides data at the agg_fuel level and has specified keep_total_cols as "agg_fuel"). If the user provides a fuel and sector level in their user data which is less detailed than the option provided for keep_total_cols in the corresponding instructions file, the data will also not be normalized (for example, if the user provides data at the agg_fuel level and has specified keep_total_cols as "agg_fuel, CEDS_fuel").-
blank or NA - no normalization will occur
-
agg_fuel
-
[agg_fuel], CEDS_fuel
-
agg_fuel, agg_sector
-
[agg_fuel], CEDS_fuel, agg_sector
-
agg_fuel, [agg_sector], CEDS_sector
-
-
use_as_trend
takes a boolean argument. IfTRUE
, the data will be used as a trend rather than as raw data; values will be scaled to CEDS values for a givenmatch_year
. Defaults toFALSE
. -
match_year
takes an integer year argument. Required ifuse_as_trend
isTRUE
, otherwise defults toNA
. -
start_continuity
is used to specify whether data should be made continuous at its beginning. Takes a boolean; defaults toTRUE
. -
end_continuity
(seestart_continuity
) -
interpolation_method
defines how to treat missing values in the data. Must be one of the following:-
linear
(default) -
match_to_default
— fills in missing values based on the trend of the default activity data -
match_to_trend
— fills in missing values based on a trend provided by the user; if specified, the parametermatching_file_name
must be present
-
-
matching_file_name
is the name of a file containing values to be used as a trend for interpolating missing values from the data. Columns outside of the years specified bystart_year
andend_year
will be ignored. Defaults toNA
. -
preprocessing_script
is the name of an R script to be run before attempting to map or load the data associated with this instruction. Expects a file path relative to theuser-defined-energy
directory.
This section details some of the major functions of the user data processing system.
Occurs during pre-processing of data, but after running any user pre-processing script. This section uses user-specified *-mapping.xlsx files to bring data into CEDS form. Any data at the detail level of CEDS_fuel or CEDS_sector will be automatically have the aggregate fuel or sector mapped on.
Interpolation occurs during pre-processing of data. The process fills holes in data that has gaps or that has less-than-annual (e.g. every 5 years) data. Interpolation can occur linearly (the default) or on a trend specified in the Interp_instructions sheet of [filename]-instructions.csv.
Normalization is the process by which data is included in the greater activity database without losing aggregate totals. CEDS activity defaults are generated by using percentage breakdowns to disaggregate high-level (aggregate fuel per country) data. When user-specified data is added, the system will include it by offsetting the user-defined changes in other areas of the aggregate group.
By adding specific fuel by sector activity in one place, CEDS adjusts the breakdown of fuel activity, not the total fuel activity.
Normalization Exceptions:
-
Whole-group overwrite: if all elements of an aggregate group are specified, the aggregate sum is overwritten (see Batching).
-
If a user-specified subset exceeds an aggregate group total, that total will be overwritten.
If several instructions correspond to the same aggregate group, these instructions will need to be processed together all at once. Groups of user data in the same batch are handled as a single input, in that they are normalized in one step. In the case that a user specifies rows of data for an entire aggregate group for a given time period, they will be batched together and will overwrite the normalization process. If they have different but overlapping year ranges, each dataset will be subsetted to year ranges allowing for the processing of overlapping sections separate from non-overlapping sections.
By default, user-specified data is made continuous with the CEDS defaults at its beginning and end. The data are linearly adjusted over a specified year range (7 years by default, fewer if necessary) so that the value of the first year represents 1/7 new data and 6/7 CEDS data and the value of the 7th year is 6/7 of the new data plus 1/7 of CEDS data.
Instructions are ordered by:
-
Priority
-
Aggregation specificity
-
Start year
Meaning that all data with high priority will supersede data with lower priority; within equal priority, more specific data will supersede less specific (more aggregate) data; and, all else being equal, older data will supersede newer data. This order only matters if more than one dataset will impact the same activity cell.
The Community Emissions Data System (CEDS) is at its core a selection of R scripts and data files linked together by a Makefile. CEDS is flexible to user input. Throughout the system are built-in mechanics for automatically identifying and processing user-added data and scripts.
CEDS code execution is divided into modules, groups of code executed together for a common purpose. The nine CEDS modules are as follows:
Name |
Purpose |
Module A |
Activity and driver data processing |
Module B |
Combustion emissions factors |
Module C |
Non-combustion emissions and emissions factors |
Module D |
Default emissions calculations |
Module E |
Emissions inventory processing |
Module F |
Scaling to inventories |
Module G |
Gridding |
Module H |
Historical extension |
Module S |
Summary and final data processing |
This documentation provides information module by module. To find instructions for a desired change or input, identify the module purpose which best fits the aspect of CEDS you will change.
Module A runs initial processing on driver data, and creates the total activity driver database.
Module A is not designed to be as flexible as the other modules. Preserving Module A defaults is recommended, except where overwriting a particular input. In general, additional supplemental data is best added later in the system.
Module A is unique in CEDS in that it contains no emissions-specific processing. It handles activity and driver data, and not emissions or emissions factors. Because of this, Module A only needs to be executed once even during a recursive make.
-
Population data is created from UN and HYDE population inputs. Adjustments to population data must be made in these inputs or in A1.1.UN_pop_WB_HYDE_extension.R.
-
A.1* contains other driver scripts dependent on only population (biomass dataset, pre-processing of IEA energy data, coal heat content). Pre-processing emissions-nonspecific scripts can be added to this section.
-
Module A.2 handles specific adjustments to IEA data, including converting to CEDS sectors and fuels.
-
Modules A.3 and A.4 handle expanding the activity database to include complete CEDS specificity and fuel/sector combinations. The results of this section are the activity databases A.comb_activity.csv and A.NC_activity_energy.csv, which store activity data defaults used throughout CEDS.
-
Combustion Energy data is primarily from IEA and BP data (processed in Module A2-A4), while non-combustion driver data is from various sources (Module A5).
-
Combustion or non-combustion sectors are specified in the Master_Fuel_Sector_List.xlsx. IEA process sectors are identified in IEA_process_sector.csv.
-
The important distinction between combustion and non-combustion activity is driver; combustion sectors have fuel drivers, while non-combustion sectors have proxy process driver data (population, pulp paper production, etc.).
Module B is responsible for processing combustion emissions factors.
Module B executes in 3 steps:
-
B1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (
B1.1.base_…
) -
B1.2 reformats specific datasets and use header functions to add the results to their databases. (
B1.2.add_…
) There can be any number of “add” scripts per section. -
B1.3 “processes” activity (
B1.3.proc_…
)
Module B uses a parental structure to call scripts. “B1.1_base_comb_EF.R” and “B1.2.add_comb_EF.R” are the only two scripts executed by the Makefile. Each script identifies and executes a series of other scripts based on the emissions species, for example
if ( em == "BC" || em == "OC" ){
scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R' )
}
…
invisible( lapply( scripts, source_child ) )
Any script added to the list “scripts” as a string will be executed by the parent script.
There are two types of B1.2 files. Some files generate processed data as intermediate output files, creating data on control percents, ash content, etc. (most notably for emissions species SO2). Other scripts read in all data files of a certain type, which may have been produced earlier in B1.2, or may have been included as defaults.
Adding a processing script to Module B requires:
-
A script in the module-B folder, named according to conventions described in the CEDS style guide.
-
A change to whichever parent file is appropriate for sourcing the new script.
-
Any input data will need to be included in the input folder.
An example of a change in a parent script: if I want to add a new BCOC processing file, 'B1.2.add_BCOC_additional.R', the above would become:
if ( em == "BC" || em == "OC" ){
scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R', 'B1.2.add_BCOC_additional.R')
}
This modular structure means that no changes to the Makefile are needed to add scripts in Module B.
Raw emissions factors can be directly incorporated into the CEDS emissions factor database.
Save the data in a .csv file with columns for iso, fuel, sector, unit (usually "fraction"),
and data years in Xyears, in the folder input/default-emissions-data/EF_parameters/
.
Name the file U.[em]_*[suffix].csv
where *
represents any descriptive,
meaningful title for the data and [suffix] is any of the following:
Pattern |
Use |
"_EF" |
Adds the data as raw emissions factors |
"_control_percent" |
Adds the data as control percents (SO2 only) |
"_s_ash_ret" |
Adds data as sulfur ash retention data (SO2 only) |
"s_content" |
Adds data as sulfur content data (SO2 only) |
Files without any of these suffixes, or without the emissions species in the file name, are ignored.
Files in this directory are processed by species, in alphabetical order.
Data in files read in later will overwrite data in files read in earlier. This is
why user files should begin with U.
so that data in these files will be given
priority.
Module C is responsible for processing non-combustion emissions and emissions factors.
Module C follows the same three-part structure as Module B:
-
C1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (
C1.1.base_…
) -
C1.2 reformats specific datasets and use header functions to add the results to their databases. (
C1.2.add_…
) There can be any number of “add” scripts per section. C1.2 uses a parent script model. -
C1.3 “processes” activity (
C1.3.proc_…
) C1.3 (the “process” group) does not use a parent script model, so adding a process script requires editing the Makefile.
Non-combustion emissions can be added to Module C without the inclusion of a new script. There are two ways to do this.
Save a dataset as a .csv
file in the folder input/non-combustion-emissions
with
headers indicating iso, fuel, sector, and years of emissions (the data will be in
wide form — a column for each year). Emissions will be linearly interpolated if
there are missing years. Emissions extended forward and backward from the years
supplied with a constant emissions factor. This means that for
most process emission sectors emissions will be scaled with population. If this
is not appropriate, it is best
to supply emissions over the entire modern time period (1060 forward).
The second method to add to the default emissions database is to add lines
to the input/non-combustion-emissions/add_inventory_instructions.csv
instruction
file. This will take data from processed emission outputs and add these, as
emission factors, to the default emissions database. As above, these will be
added as constant emission factors before and after the specified years. This
is a good way to correct for large scaling factors in instances where default
process emissions data were not a good match for a scaling inventory.
Module D contains a single script for initializing emissions databases based on driver and activity data calculated in modules A through C. It is relatively inflexible and is meant to bridge emissions factors + drivers and emissions. It calculates emissions, creating a default that will be scaled and extended by modules F and H.
Module E processes emissions inventories. Each script is tailored to its particular inventory. Each script outputs a processed form of the raw inventory made compatible for CEDS analysis.
Module E is typically executed immediately after Module A.
Typically, Module E scripts have three sections.
-
The first defines inventory-specific parameters: file paths, year ranges, etc.
-
The second reads in and processes the data, shaping the inventory to a standard format (wide-form, iso tags) but does not map to CEDS sectors or fuels.
-
The third writes the data to
intermediate-output
.
Module E scripts diverge from this format when further data processing is required to make scripts in standard form.
-
Add raw input files to
input/emissions-inventories/
-
Add a processing script to the
module-E/
folder -
Add a section of code to the Makefile in the area handling emissions inventories. The line should look like the following (for example script “E.myinventory_emissions.R”):
This code indicates that “module-E/E.myinventory_emissions.R” needs to be executed as an Rscript, and that it will produce the output file E.[em]_myinventory.csv.# process emissions from 'myinventory' $(MED_OUT)/E.$(EM)_myinventory.csv : \ $(MOD_E)/E.myinventory_emissions.R Rscript $< $(EM) --nosave --no-restore
The purpose of Module F is to scale subsets of CEDS emissions data to the emissions data reported in other inventories. In doing so, CEDS reinforces its accuracy at an aggregate level while retaining the specificity of CEDS fuels, sectors and isos that distinguish the model from the scaling inventories.
Module F consists of:
-
A header file,
emissions_scaling_functions.R
-
The header file contains generalized functions that are called in each scaling script. These functions are used to read and write data, apply mapping files, and perform scaling calculations.
-
-
A parent script,
F1.inventory_scaling.R
-
The parent script calls inventory-specific scaling script depending on the emissions species.
-
-
A series of scaling scripts corresponding to each inventory, (e.g.
F1.1.UNFCCC_scaling.R
)-
Each scaling script reads in an inventory dataset and updates the default data in the CEDS data sets.
-
-
Mapping files for each inventory dataset used
Module F is executed by running the parent script. Depending on the emissions species provided, the parent script calls a series of scaling scripts, which execute scaling and then write to an intermediate output file to be scaled by the next script. Scaling the same region more than once will overwrite the earlier scaled values. This means that the order of the scaling scripts is important, and inventories with greater accuracy should be included later to avoid being overwritten by a less accurate inventory.
Each Scaling script has a similar structure:
-
Section 0: Universal section, the same for all scripts
-
Section 1: Defines inventory-specific variables such as file names, countries, years the inventory includes, and scaling method
-
Section 1.5: Import inventory-specific data and put in standard inventory format (iso-sector-fuel-years or iso-sector/fuel-years)
-
Section 2: Read in all other scaling data and define variables using scaling functions
-
Section 3: Aggregate CEDS and inventory data to scaling sectors/fuels using scaling functions
-
Section 4: Calculate scaling factors and apply scaling factors to default emissions and emission factors using scaling functions
-
Section 5: Write scaled data to intermediate output file
Section 1 – 1.5 are unique to each inventory used for scaling. Sections 0, 2-5 can be identical for all scaling scripts, unless the user would like to define different default options in Section 4 to create scaling factors with the function “F.scaling”.
-
Inventory files can be excel sheets that are imported and processed to standard format within the scaling routine (ex. Canada), or imported and processes within Module E (ex. UNFCCC). By section 2, inventory data must be in standard form with iso, CEDS sector/fuel (or both) columns and years in Xyear format.
-
Instruction files define how to relate scaling inventory and CEDS default data through scaling sectors or scaling fuels, as well iso-sector-fuel specific options for scaling routines. Instruction files must be .csv format and located in the
CEDS/input/mappings/scaling
folder. A mapping instruction file must be provided, and “method”, and “year” instructions can optionally be provided if needed. The name of the “mapping” instruction is is specified as follows:<inventory_name>_scaling_mapping<_extra_information>.csv
(for example “EMEP_NFR09_scaling_mapping_SO2.csv”). For “method” and “year” files_extra_information
is_mapping
or_year
respectively.-
The “mapping” instructions relate the inventory data to the CEDS data by scaling method: either fuel, sector or both. It relates the inventory sector/fuel to the scaling sector/fuel and the scaling sector/fuel to CEDS sector/fuel. For example using the sector scaling method, the
inv_sector
column maps to thescaling_sector
column, and theceds_sector
column maps to thescaling_sector
column, but theinv_sector
column does not map to theceds_sector
column. Entries on the same row in theinv_sector
andceds_sector
columns have no meaning. Inventory sectors/fuels or CEDS sectors/fuels should only be mapped to one scaling sector (although multiple inventory or CEDS sectors/fuels can be mapped to one scaling sector). If an inventory or CEDS sector/fuel is mapped to more than one scaling sector/fuel, the system will match to the first pair in the data frame. The selected scaling sectors/fuels are applied to all countries in the inventory. An example section from a mapping file is shown below:inv_sector
scaling_sector
ceds_sector
Notes
Electricity and gas supply
energy
1A1a_Electricity-public
Industry_Electricity
energy
1A1a_Heat-production
Industry_Oil refinery
other-transformation
1A1bc_Other-transformation
…
…
…
…
-
The optional “method” file defines interpolation and extrapolation methods for handling data if they differ from the default. The
F.scaling
function is used to execute the instructions in this file. Method file columns include:-
iso: can be specified "all" (meaning all CEDS isos) or specific isos
-
scaling_sector: cannot be "all". This must be specified for each sector that departs from the default method.
-
other: space for an additional parameter if needed by specified method (see linear_1 example below)
-
pre_ext_method: how the data will be extended backward in time from its beginning
-
interp_method: how internal holes (missing years) in inventory data will be filled. The default is that emission factors are linearly interpolated between inventory years.
-
post_ext_method: how the data will be extended forward in time from its end
-
An example selection from a method file is shown below:
iso
scaling_sector
other
pre_ext_method
interp_method
post_ext_method
twn
SLV
2000
linear_1
linear
constant
twn
waste_water
2000
linear_1
linear
constant
twn
waste-incineration
2000
linear_1
linear
constant
twn
AGR
2000
linear_1
linear
constant
twn
rail
1999
linear_1
linear
constant
-
Extension methods include:
method
description
valid columns
constant
use the edge scaling factor constantly across all extension years
all
linear
extend the scaling factor trend linearly
all
linear_1
linearly extend the scaling factor to reach a value of 1 in either, the final extension year (post_ext_method) or the year specified in Other column (pre_ext_method).
post_ext_method, pre_ext_method
-
-
The optional "year" file defines the year extent of the scaling process. It allows the user to extend scaling factor to different years for individual iso-sector/fuels. It follows a similar structure to the "method" file with these columns:
-
iso: can be "all" or specific isos
-
scaling_sector: cannot be "all". Must be specified for each sector.
-
pre_ext_year: The year in which the scaling data will begin (after extension, if necessary)
-
post_ext_year: The year in which the scaling data will end (after extension, if necessary)
-
-
The following variables must be defined in Section 1 of any scaling script in order to use the modular Sections 2-5.
-
inventory_data_file
- the name of the inventory file, without the extension -
inv_data_folder
- name of the path to the folder the inventory file is in, from domainmapping.csv (usually "EM_INV" for theCEDS/input/emissions-inventories/
directory) -
sector_fuel_mapping
- the name of the inventory mapping file, without the extension -
mapping_method
- mapping method. Must be "sector", "fuel", or "both" -
inv_name
- name of the inventory (for labeling diagnostic/intermediate output, not for reading input files) -
region
- iso countries included in the inventory -
inv_years
- years covered by the inventory
The following functions are used throughout Module F. They are defined in
code/parameters/emissions_scaling_functions
-
F.readScalingData( inventory=inventory_data_file, inv_data_folder, mapping=sector_fuel_mapping, method=mapping_method, region, inv_name, inv_years )
Reads in all scaling data, defines variables for scaling and assigns them to the global environment.
-
F.invAggregate( std_form_inv, region, mapping_method, zeroed_terms=c(NA, 'NA', 'NA ', '-'))
Aggregates inventory data to scaling sectors/fuels. There are no user-defined options in this function.
-
F.cedsAggregate( input_em, region, method=mapping_method )
Aggregates CEDS data to scaling sectors/fuels. There are no user-defined options in this function.
-
F.scaling( ceds_data, inv_data, region, ext_start_year=start_year, ext_end_year=end_year, ext=TRUE, interp_default='linear', pre_ext_default='constant', post_ext_default='constant', replacement_method='none', max_scaling_factor=100, replacement_scaling_factor=max_scaling_factor )
Calculates scaling factors where both inventory and CEDS data are available. Interpolates and extends scaling factors forward and backward if ‘ext’ = TRUE. Also checks and replaces scaling factors if too small or too large.
Parameters:
-
ext_start_year
- Year to extend scaling factors back to. Defaults to global environment variable ‘start_year’ (currently 1960) -
ext_end_year
- Year to extend scaling factors forward to. Defaults to global environment variable ‘end_year’ (defined incode/parameters/common_data.R
) -
interp_default
- Default interpolation method for scaling factors within the inventory years. Either ‘interpolation’ or ‘constant’. Defaults to linear interpolation. -
pre_ext_default
- Default extrapolation method for pre inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’. -
post_ext_default
- Default extrapolation method for post inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’. -
replacement_method
- Either 'none' or ‘replace’. If ‘replace’ then function checks scaling factors and replaces values above and below the threshold defined bymax_scaling_factor
. -
max_scaling_factor
- If replacement method is ‘replace,’ Scaling factors greater than max_scaling_factor and less than 1/max_scaling_factor
are replaced byreplacement_scaling_factor
or 1/replacement_scaling_factor
, respectively. -
replacement_scaling_factor
- value to replace too large scaling factors with. Defaults to max_scaling_factor. Small values are replaced by 1/replacement_scaling_factor
.
-
-
F.applyScale (scaling_factors)
Applies scaling factors to CEDS default data. Creates scaled EF and scaled emissions.
-
F.write( scaled_ef=scaled_ef, scaled_em=scaled_em, domain="MED_OUT")
Writes scaled emissions factors to intermediate output folder.
Module F tracks scaling by collecting scaling value metadata.
The script global_settings.R
contains a boolean switch, Write_value_metadata
;
if TRUE
, CEDS will generate value metadata reports across every combination of fuel, sector, iso, and year
indicating which scaling factors were applied and whether the cell was scaled directly
to an inventory or to an extension of an inventory.
The output file of this process is F.[em]_scaled_EF-value_metadata.csv
.
Two diagnostic pieces of code, code/diagnostic/Create_Val_Metadata_Heatmap.R
and code/diagnostic/Create_Master_Val_Meta_Heatmap.R
, provide functions for
analyzing and displaying graphically trends in the value metadata.
Adding a new inventory can be done in the following steps:
-
Add a Module E script to process the inventory data into CEDS format. Note that in most cases it is advised to leave the inventory data in the inventories native sectors. Conversation to standard metric units, however, should be done here.
-
Update the Makefile to reflect the new Module E script and associated dependencies
-
Add a Module F scaling script. This can be done with minor changes to an existing Module F scaling script.
-
Add a Module F sector mapping file for that inventory (referenced from within the new scaling script)
-
Add the new scaling script to the master Module F script
for the relevant emission species.F1.inventory_scaling.R
Module G handles gridding, the process by which spatial distributions of CEDS final emissions are calculated.
Module G is composed of three main sections. Each section executes 4 scripts. Scripts are executed sequentially; no parent script is used. Twelve grids are created for each year (monthly emissions, incorporating seasonality) from 1750-2014, for each emissions species and sector.
The three main sections are:
-
G1 creates yearly spatial grids. Each netCDF file contains 12 months and the sectors appropriate for that section. (These temporary files are stored at
intermediate-output/gridded-emissions
) -
G2 chunks these grids in 50-year groups. These aggregated emission files are the final gridded output and can be found at
final-emissions/gridded-emissions
. -
G3 creates grids and chunks for Methane. Methane is treated separately since the CEDS release data only produced Methane emissions for recent decades. A separate, approximate, extension is provided as supplemental data.
Each section has four scripts; these each handle a different type of input data.
-
G*.1 handles bulk emissions. The input data for this grid is CEDS final emissions by country and sector (no fuel information) for all sectors except aircraft. These scripts handle each emissions species and each sector.
-
G*.2 handles spectated VOC ('subVOC') emissions. For NMVOC emissions, individual grids are generated for each VOC sub-species.
-
G*.3 handles aircraft emissions. In addition to 12 monthly grids for each year, aircraft emissions have 25 levels of gridding corresponding to different altitudes.
-
G*.4 produces gridded emissions from solid biofuels. (Note that these are already included in the aggregate emission files, but are broken out at the request of users.)
-
G*.5, like G*.4, produces gridded emissions for user specified fuels. (Again note that these fuels are already included in the aggregate, but will be broken out and generate additional files.) This works in conjunction with
custom_fuels_to_grid.csv
in thegridding_mappings
directory. Place the desired extension to your output file in the rows with the fuels for which you would like to grid.
Spatial distributions are generated by applying CEDS final emissions to normalized country-level
spatial proxy data. Spatial proxies are chosen for each gridding sector, emissions
species, and year in input/gridding/gridding-mappings/proxy_mapping.csv
. The CEDS gridding routines are described in Feng et al. (2019).
Emissions by country, sector, fuel must first be generated by running the CEDS system.
In order to produce gridded emissions, the gridded proxy data must be obtained from zenodo through this link.
The package there contains four folders:
mask, proxy, proxy-backup, seasonality
Copy these folders into the input/gridding
folder in your CEDS directory. Assuming you do not have any previously modified CEDS proxy data files in your system, you can replace the folders that are already there from the GitHub distribution (you will note that those folders are otherwise empty in the CEDS GitHub distribution).
Gridded emissions can then be produced using the makefile system, for example make BC-gridded
. Note that if the final emissions file is not up to date, the make BC-gridded
command will re-run the entire system as needed.
Note: users should edit the netCDF metadata as instructed in the file:
`code/parameters/nc_generation_functions.R`
to reflect their project and contact information.
For some purposes it may be useful to run the gridding commands individually, such as:
`Rscript code/module-G/G1.1.grid_bulk_emissions.R BC --nosave --no-restore`
to produce annual emission grids over all years, and
`Rscript code/module-G/G2.1.chunk_bulk_emissions.R BC --nosave --no-restore`
to aggregate emissions into 50-year chunks.
Specific commands for other emissions can be found in the Makefile
.
Note that the gridding routines, when run directly as indicated above, can be used to map user generated data to spatial grids, even using input data generated outside of CEDS. To do this first provide the user-generated data as indicated in the following template:
`intermediate-output/XX_total_CEDS_emissions_template.csv`
Most of the gridding routines use summary data, which will need to be generated by running:
`Rscript code/module-S/S1.1.write_summary_data.R BC --nosave --no-restore`
If providing user-edited information do not use the makefile system, as this may overwrite the user-supplied emissions information, and instead use the gridding commands directly as described above.
The years used for producing the annual grid files and the chunked grid files are set separately.
In the 1. G*.1 files, the year range for gridding is set by the end_year and end_year functions in the gridding_initialize
function call, which is near the beginning of each gridding script.
The range of years used for chunking, and the length of each chunk, are set in the chunk_emissions
function (in nc_generation_functions.R
). The settings here will be used for chunking operations for all emissions.
An option is available to remove a single country from the gridded data. This is done by specifying the iso in common_data.R
with the variable grid_remove_iso
. For a list of isos, refer to the file: input\mappings\Master_Country_List.csv
.
The gridded data files and checksums that are generated will have the suffix no_<iso>
.
The diagnostic data generated by the gridding routines includes:
-
The metadata within each gridded emissions netCDF file contains a global attribute
global_total_emission
that contains a value equal to total global emissions for one or more years. -
A .csv file with a global checksum values equal to global emissions for each final gridding sector and year is also generated. The chunking routines generate a consolidated .csv file with checksum values for all years in each chunked file.
-
For each species the file:
diagnostic-output/G.XX_bulk_emissions_checksum_comparison_diff.csv
that shows the absolute difference between the sectoral checksum value as summed from the gridded emissions and as summed from the final emissions files. The same data is written as…_per
in terms of percentage differences. Some differences are expected but if these numbers are large then there is some more fundamental problem with the gridding system that should be fixed.
NetCDF files with total emissions (summed across all sectors other than aircraft) are provided for convenience (both monthly and summed across months) here:
`diagnostic-output/total-emissions-grids`
Module H is responsible for the extension of CEDS data from 1960 or 1971, depending on the the country, back to 1750. Many CEDS activities, particularly fuel combustion, are extended using CDIAC trends, which hold information per aggregate fuel and country.
The user can specify extension methods and associated data by sector and emission species. These are specified in the files CEDS_historical_extension_drivers_activity.csv
and CEDS_historical_extension_methods_EF.csv
in the folder input/extension/
.
Extension methods should be specified for every sector, emission species, and over the entire extension time period (back to 1750). Species-specific methods can also be specified. Examples of the available methods can be found in CEDS_historical_extension_methods_EF.csv
.
Module H also can perform some simple adjustments to emission factor bounds (e.g., min/max emission factors).
Mass balance adjustments for SO2 and CO2 emissions are also performed in Module H.
Module S conducts final processing and summary procedures. This is the last Module
in the CEDS system. Its input is intermediate-output/[em]_total_CEDS_emissions.csv
and its output is a series of final emissions breakdowns and summaries, notably
The main body of Module S is contained in a single script, S1.1.write_summary_data.R
.
Module S begins by reading in the final emissions disaggregated data. The script aggregates the data to all levels required.
The script then checks if an older run of this emissions species is present in the
final output folder (which is not wiped clean by the Makefile during an execution
of make clean-all
).
If no older data is present in the final-data/current-versions
folder, the script writes its summary files.
If an older dataset is present for this emissions species, the script executes
a comparison between the two datasets. The script overwrites the old data if the new data
is different, and if different, also
produces a series of diagnostic files exploring differences between the outputs of
the two runs in the diagnostics/
folder as described in the sub-section below.
The script then sources three files:
-
Figures.R
creates and outputs a series of figures to thesummary-plots/
folder including global emissions graphs and further aggregations. -
Compare_to_RCP.R
is called except when the emissions species is 'CO2'.-
This script creates global, regional, and sectoral comparisons between the CEDS output and the RCP inventory emissions as
*.csv
files in theceds-comparisons/
subfolder ofdiagnostic-output/
. -
It also produces graphical comparisons of the same data.
-
-
Compare_to_GAINS.R
is called except when the emissions species is 'CO2' or 'NH3'.-
This script creates global comparisons, including specific comparisons for residential and non-residential emissions.
-
It also produces graphical comparisons of the same data.
-
Module S produces the following summary files in the final-emissions/current-versions
sub-folder:
-
All bunker (international aviation and shipping) emissions,
S.[em]_bunker_emissions.csv
-
CEDS final emissions aggregated to different levels:
-
Aggregated to each country and aggregate sector,
CEDS_[em]emissions_by_country_sector[ver].csv
-
Aggregate to country totals,
CEDS_[em]emissions_by_country[ver].csv
-
Global emissions per specific fuel,
CEDS_[em]global_emissions_by_fuel[ver].csv
-
Emissions aggregated to CEDS sectors and countries,
CEDS_[em]emissions_by_country_CEDS_sector[ver]csv
-
Global emissions per CEDS sector,
CEDS_[em]global_emissions_by_CEDS_sector[ver].csv
-
Each file is also suffixed by the date of the execution of the run, or the user-specified version number (also in the form of a date).
Note that if there are no changes between the current and previous run, the files will not
be updated, and the files in the current-versions
folder will retain the modification
date from the previous run.
If the results of the current run are different from those of previous run of the CEDS system for this emissions species, the following comparison diagnostics are produced, if there is relevant data.
-
Files that show changes in rows and columns (new or deleted rows/columns between the old/new versions of the data):
-
./diagnostics/CEDS_[em]emissions_by_country_sector[ver]_dropped-rows
-
./diagnostics/CEDS_[em]emissions_by_country_sector[ver]_added-rows
-
./diagnostics/CEDS_[em]emissions_by_country_sector[ver]_dropped-cols
-
./diagnostics/CEDS_[em]emissions_by_country_sector[ver]_added-cols
-
The following diagnostic files show, for aggregated sectors, differences between the old and new versions of the output data. Only rows with differences are shown to keep file size reasonable.
-
Percentage, absolute, and consolidated comparison files identifying changes between the two outputs:
-
./diagnostics/CEDS_[em]emissions_by_country_sector_comparison[ver]_diff-percent.csv
-
./diagnostics/CEDS_[em]emissions_by_country_sector_comparison[ver]_diff.csv
(absolute differences) -
./diagnostics/CEDS_[em]emissions_by_country_sector_comparison[ver]_comparison.csv
-
The last _comparison.csv
file listed above is a consolidated comparison, in long format,
that shows old values, new values, and their absolute and percentage differences.
For the two difference diagnostic files (percentage and absolute) only changes above a threshold are shown. In these files, a "0" indicates that there was some change in the data, but that the change was below the threshold value. If nothing is shown (blank), then the data were identical to all digits. This presentation suppresses spurious differences caused by, for example, different package versions.
CEDS is executed using a makefile system. A single file, called the Makefile and saved in the main CEDS folder, contains instructions for the execution of the entire CEDS system.
The Makefile is execute on the command line of your choice using the command
make *
where * is a valid command line argument.
Some Makefile execution commands:
|
Executes a run of CEDS for each valid emissions species except CH4 |
|
Executes CEDS for emissions species CO2, or any other specified emissions species (generic |
|
Deletes all intermediate, diagnostic, and final output files |
|
Deletes all files output by Module B (valid for all modules) |
|
Deletes all intermediate files relating to CO2 |
The Makefile is made up of "Code blocks". Each code block is headed by the output file that will be created, and is followed by all of the input files and scripts required to create that file. Most code blocks will include an indicator that one or more Rscripts should be executed.
If an input file is missing, or if an Rscript fails to create an intermediate file needed by another script, the Makefile will throw an error, saying that there is no rule to build the missing file.
The CEDS code contains a "parameters" folder. This folder stores header files. These files are sourced at the beginning of some scripts to load functions and global data.
The files in this folder are as followed:
File |
Contains |
analysis_functions.R |
Functions that map to CEDS, check if all sectors/countries/fuels are present. |
common_data.R |
Global variables, e.g. years, default conversion factors. |
data_functions.R |
Various data processing functions, e.g. %!in%, replacing data, build CEDS template, remove NAs or blanks, etc. |
diagnostic_functions.R |
A function that compares two identically formatted dataframes for equality. |
emissions_scaling_functions.R |
All functions specific to Module F, e.g. value-metadata functions, scaling functions, functions that add to scaled databases, etc. |
global_settings.R |
To be called at the beginning of every script. Initializes CEDS version number and global options. |
gridding_functions.R |
All functions specific to Module G. |
header.R |
Contains functions required for initializing the log and for sourcing other parameters scripts. Contains the |
interpolation_extension_functions.R |
Contains functions for interpolate or extend time series data (NOT interpolate_NAs, extend on trend). |
IO_functions.R |
Contains readData, writeData, and printLog, along with other functions for reading in or outputting information. |
ModH_extension_functions.R |
All functions specific to Module H, including data processing, merging, and disaggregating functions. |
nc_generation_functions.R |
Some supplemental non-combustion gridding functions |
process_db_functions.R |
Contains functions for generically adding data to databases (e.g. addToEmissionsDb). Also contains a cleanData function. |
timeframe_functions.R |
Contains a series of helper functions for dealing with data time range, from identification to truncation. |
The CEDS system produces a number of diagnostic graphs and files that can be used to visualize results, evaluate changes from previous versions, and can be used to look for aberrant behavior. These include:
The CEDS/diagnostic-output
folder contains a large number of intermediate output
files and graphs from various CEDS scripts. Two main sets of graphs are produced
that are useful in examining results. Excel files that correspond to the data in
graphs are also produced.
-
Comparison with GAINS: Graphs of CEDS emissions by region are compared with GAINS point estimates (at every five years) are provided. Graphs are generated for total emissions, residential, combustion, and non-residential.
-
Another set of graphs that show long-term CEDS (1850 forward) trends by "RCP region" compared to the Lamarque et al. 2010 ("RCP") emissions.
Diagnostic files for emission scaling are also produced. These provide the inventory data, aggregated to scaling sectors, and the scaling factors over time. These are useful to identify sectors where CEDS default emissions were very different from inventory values. This may indicate a sector definition mismatch or a place where CEDS default assumptions may need to be updated. This is particularly important if there is a mis-match in the earliest scaling year.
Whenever the CEDS system is run via the "make" system, the final emissions by
country and sector are compared with data from the previous run. Differences by
sector and country are written out to a set of diagnostic files in the
final-emissions/diagnostics
folder as described in the CEDS Final Output Diagnostics
section.
CEDS should produce the same results when running on different machines or versions of R. We have experienced, however, that the order of the results will differ when using different versions of R. Different versions of R, for example, have different behavior for the order that base::merge
combines two data frames, which will produce final results in a different order, but with equal values.
When adding new scripts to CEDS, make sure to correctly scope functions from any packages you are using. For example, plyr
and dplyr
contain many functions with the same name but different functionality. You cannot count on CEDS to load packages in the order that you want and will get errors or wrong results if you make that assumption. The best practice is to scope functions when used to assure the desired version will be used (e.g., dplyr::mutate(…)
)
While it is convenient to create .csv files using Microsoft Excel, it is also easy to create files that have extra columns or extraneous characters. If you encounter errors reading in a user created .csv file it can be useful to open the file in a text editor to check formatting.
Note that if there are some blank columns in one row but not another you will get an error such as this:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Cleaning up the .csv file will correct this error.
Most machines will read UTF encoded text files ok, but we have found that some systems will error if text files are encoded with a Byte Order Mark (BOM). In these cases you might get a machine specific error such as the following:
Error in `[.data.frame`(exten_df, , c("iso", "sector", "fuel")) :
undefined columns selected
If you were to look at the dataframe exten_df
in a debugger (such as by using the R browser()
command) you will see that the first column name shows up as …iso
which is indicating that a hidden character is part of this column name.
The solution for the BOM issue is to edit the file to have plain UTF-8
encoding without a BOM. You can try Microsoft Excel - we’ve found that this will sometimes show this in a way that can be edited out. BBEdit for mac OSX also allows you to change the text encoding type.
This type of problem can show up in any column if there is a hidden character in the text. These types of errors are difficult to debug since the hidden characters will not display in many cases in editors. But the issue will always be apparent when looking at the data frame in an R debugger since the problematic name will show up as containing …
in addition to the visible text.