Skip to content

Commit

Permalink
Changes to handle AEO 2018 residential data files
Browse files Browse the repository at this point in the history
In ‘mseg.py’:
-Modified number of footer lines to skip in ‘rsmlgt.txt’.
-Added new columns for utility rebates and technology-specific choice weights in ‘rsmlgt.txt’.
-Added “latin1” encoding argument to numpy genfromtxt command on ‘rsmlgt.txt’ import.

In ‘mseg_techdata.py’:
-Added new columns for equipment rebates in ‘rsmeqp.txt’.
-Removed major fuel flag column in ‘rsclass.txt’.
-Added new columns for utility rebates and technology-specific choice weights in ‘rsmlgt.txt’.
-Modified number of footer lines to skip in ‘rsmlgt.txt’ and header lines to skip in ‘rsclass.txt’.
-Added “latin1” encoding argument to numpy genfromtxt command on ‘rsmlgt.txt’ and ‘rsclass.txt’ imports.

Break out all AEO residential MELs, minor naming changes

The following changes are now reflected across the entire suite of Scout files (from AEO data updating modules through to analysis engine modules and documentation):
-Break out the former ‘other MELs’ categories such that AEO stock and energy data are available for the individual technologies ‘coffee maker,’ ‘dehumidifier,’ ‘microwave,’ ‘pool heaters and pumps,’ ‘security system,’ ‘portable electric spas,’ ‘wine coolers,’ and ‘electric other’.
-Change ‘other (grid electric)’ end use to ‘other’ and extend across all fuel types.
-Add the ‘other appliances’ technology to the ‘other’ end use category for non-electric fuels.
-Change the ‘non-specific’ secondary heating technology for electric and natural gas fuels to ‘secondary heater’; also change ‘secondary heating’ technology for all other fuels to ‘secondary heater’ (e.g., ‘secondary heating (wood)’ -> ‘secondary heater (wood)’).
-Change all ampersands in technology names to ‘and’ for consistency (e.g., ‘fans & pumps’ -> ‘fans and pumps’).

Additionally, stale data fields were cleared from the results of the ‘mseg_techdata.py’ routine, and all missing technology choice data now yields a zero value for the technology in question, instead of a dict full of ‘NA’ values.

Close #202

Revise string cleaning and data handling for EIA input data files for commercial buildings to correctly read technology descriptions in the service demand data and match those strings to comparable descriptions in the technology characteristics (cost, performance, and lifetime) data.

Support AEO 2018 commercial data

Update commercial input data handling to support new miscellaneous electric load (MEL) types and update microsegments.json with these new MELs types. Update documentation to reflect the technology types available from the AEO 2018 data for commercial buildings.

Fix lighting type string handling

Fix specific problems present in the commercial lighting data from AEO and in the handling of those data. Combine 'SodiumVapor' and 'Sodium Vapor' lighting types together. Collapse linear fluorescent lighting types to simplified strings of the form 'TX FXX', e.g., 'T8 F28.' Eliminate the empty string ('') lighting technology associated with a handful of rows in the service demand data that have no 'Description' (and no service demand).

Changes to technology names in ecm_prep

The changes are needed to accommodate an expanded set of commercial MELs technology names, a more compact set of commercial lighting technology names, and revised technology names for commercial cooking.

Another small modification was made to the routine that determines common date ranges (e.g., 2013-2050) across all raw EIA input files.

Finally, the handling of missing residential consumer choice data in ‘ecm_prep.py’ was revised to reflect changes in the structure of the underlying choice dataset.

Added complete AEO 2018 baseline files, default ECM modifications, and results

Complete stock/energy and technology characteristics data are included, as are new heating/cooling totals, site-source conversions, and Consumer Price Index data.

One ECM definition had to be modified slightly to reflect the updated ‘other’ end use name.

All ECM definitions and results in the /web folder were updated to reflect these AEO 2018 data updates.

Modified LED troffers example for AEO 2018 technologies
  • Loading branch information
jtlangevin committed Jun 27, 2018
1 parent 2cca7f6 commit 2c64c84
Show file tree
Hide file tree
Showing 26 changed files with 1,791,446 additions and 1,899,031 deletions.
117 changes: 102 additions & 15 deletions com_mseg.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import re
import csv
import json
import io


class EIAData(object):
Expand Down Expand Up @@ -55,7 +56,17 @@ class CommercialTranslationDicts(object):
cdivdict (dict): Translation for census divisions.
bldgtypedict (dict): Translation for commercial building types.
endusedict (dict): Translation for commercial building end uses.
mels_techdict (dict): Translation for miscellaneous electric loads.
mels_techdict (dict): Translation for miscellaneous electric
loads (MELs). The numeric translation should be updated
each year based on the interpretation given in the AEO
commercial buildings microdata file. If there are
conspicuously missing MEL codes in the microdata, EIA
should be contacted to verify the translation between
numeric codes and descriptive names. Additionally, the
numeric codes in the end use column in KDBOUT.txt in the
rows labeled 'MiscElConsump' should be compared against
the codes in the microdata to see if any of the codes are
missing from KDBOUT.txt.
fueldict (dict): Translation for fuel types.
demand_typedict (dict): Translation for components of thermal load.
"""
Expand Down Expand Up @@ -83,7 +94,7 @@ def __init__(self):
'mercantile/service': 9,
'warehouse': 10,
'other': 11,
'FIGURE THIS ONE OUT': 12
'non-building': 12 # Applies to specific MELs
}

self.endusedict = {'heating': 1,
Expand All @@ -108,10 +119,16 @@ def __init__(self):
'laundry': 8,
'lab fridges and freezers': 9,
'fume hoods': 10,
'medical imaging': 11,
'video displays': 15,
'large video displays': 16,
'municipal water services': 17
'medical imaging': 12,
'large video boards': 13,
'IT equipment': 14,
'office UPS': 15,
'data center UPS': 16,
'shredders': 17,
'private branch exchanges': 18,
'voice-over-IP telecom': 19,
'water services': 20, # non-building
'telecom systems': 21 # non-building
}

self.fueldict = {'electricity': 1,
Expand Down Expand Up @@ -291,6 +308,10 @@ def sd_mseg_percent(sd_array, sel, yrs):
# summarized and returned from this function
elif re.search('placeholder', row['Description']):
rows_to_remove.append(idx)
# Else check to see if the description is an empty string,
# and if so, add it to the list of rows to remove
elif re.search('^(?![\s\S])', row['Description']):
rows_to_remove.append(idx)
# Else check for a special case where the year in the
# technology name sought by the tech_name regex didn't match
# because the year in the name is partially truncated at
Expand All @@ -303,6 +324,17 @@ def sd_mseg_percent(sd_array, sel, yrs):
# Delete the placeholder rows from the filtered array
filtered = np.delete(filtered, rows_to_remove, 0)

# Special filtering for lighting to drop special modifier text
# in the descriptions of linear fluorescent bulb types (e.g.,
# replace 'T8 F32 Commodity' with 'T8 F32') now that year
# details have been removed
if sel[2] == CommercialTranslationDicts().endusedict['lighting']:
for idx, row in enumerate(filtered):
# Identify linear fluorescent types
tech_name = re.search('^(T[0-9] F[0-9]{2})', row['Description'])
if tech_name:
filtered['Description'][idx] = tech_name.group(0)

# Because different technologies are sometimes coded with the same
# technology type number (especially in lighting, where lighting
# types are often differentiated by vintage and technology type
Expand All @@ -319,7 +351,7 @@ def sd_mseg_percent(sd_array, sel, yrs):
tval = np.zeros((len(trunc_technames), len(yrs)))

# Combine the data recorded for each unique technology
for idx, name in enumerate(trunc_technames):
for idx, name in enumerate(technames):

# Extract entries for a given technology type number
entries = filtered[filtered['Description'] == name]
Expand Down Expand Up @@ -711,6 +743,15 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):

# Open the target CSV formatted data file
with open(data_file_path) as thefile:
# For some cooking equipment descriptions in the service demand
# data, 11 inches is encoded as 11", which by default leaves
# the closing double-quote character in the description strings
# while removing the " that denoted inches; by inserting an
# escape character before the " denoting inches, the text will
# be handled correctly by csv.reader
if re.match('.*KSDOUT', re.escape(data_file_path)):
cont = thefile.read().replace('11"', '11\\"')
thefile = io.StringIO(cont)

# This use of csv.reader assumes that the default setting of
# quotechar '"' is appropriate; the skipinitialspace option
Expand All @@ -722,10 +763,12 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):
# if they are encountered
if '\0' in open(data_file_path).read(): # NULL bytes detected
filecont = csv.reader((x.replace('\0', '') for x in thefile),
delimiter=delim_char, skipinitialspace=True)
delimiter=delim_char, skipinitialspace=True,
escapechar='\\')
else: # No NULL bytes, proceed normally
filecont = csv.reader(thefile,
delimiter=delim_char, skipinitialspace=True)
delimiter=delim_char, skipinitialspace=True,
escapechar='\\')

# Create list to be populated with tuples of each row of data
# from the data file
Expand Down Expand Up @@ -776,7 +819,7 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):
return final_struct


def str_cleaner(data_array, column_name):
def str_cleaner(data_array, column_name, return_str_len=False):
"""Clean up formatting of technology description strings in imported data.
In the imported EIA data, the strings that describe the technology
Expand All @@ -789,9 +832,17 @@ def str_cleaner(data_array, column_name):
Args:
data_array (numpy.ndarray): A numpy structured array of imported data.
column_name (str): The name of the column in data_array to edit.
return_str_len (bool): If true, this function returns an
additional integer used for string truncation.
Returns:
The input array with the strings in column_name revised.
If return_str_len is true, then the function also returns an
integer for the string length to use to truncate the cooking
technology strings from ktek (the technology cost, performance,
and lifetime data file) to match the length of the modified
technology strings in KSDOUT (the service demand data) when
combining those data.
"""

def special_character_handler(text_string):
Expand All @@ -801,9 +852,13 @@ def special_character_handler(text_string):
text_string (str): A string describing a particular technology.
Returns:
The edited text string.
The edited text string and the string truncation length,
explained in the parent function docstring.
"""

# Replace 'SodiumVapor' with 'Sodium Vapor'
text_string = re.sub('SodiumVapor', 'Sodium Vapor', text_string)

# Check to see if an HTML character reference ampersand or
# double-quote, or standard double-quote character is in
# the string
Expand All @@ -816,12 +871,20 @@ def special_character_handler(text_string):
# use of the standalone double-quote character
if html_ampersand_present:
text_string = re.sub('&', '&', text_string)
str_trunc_len = 50 # Not used in com_mseg_tech
elif html_double_quote_present:
text_string = re.sub('"', '-inch', text_string)
str_trunc_len = 43
elif double_quote_present:
text_string = re.sub('\"', '-inch', text_string)
str_trunc_len = 48
else:
str_trunc_len = 50

return text_string
return text_string, str_trunc_len

# Store the indicated string truncation lengths in a list
str_trunc_list = []

# Check for double quotes in the first entry in the specified column
# and, assuming all entries in the column are the same, revise all
Expand All @@ -838,7 +901,10 @@ def special_character_handler(text_string):

# Clean up strings with special characters to ensure that
# these characters appear consistently across all imported data
entry = special_character_handler(entry)
entry, str_trunc_len = special_character_handler(entry)

# Record string truncation length
str_trunc_list.append(str_trunc_len)

# Delete any newly "apparent" (no longer enclosed by the double
# quotes) trailing or (unlikely) leading spaces and replace the
Expand All @@ -851,12 +917,33 @@ def special_character_handler(text_string):

# Clean up strings with special characters to ensure that
# these characters appear consistently across all imported data
entry = special_character_handler(entry)
entry, str_trunc_len = special_character_handler(entry)

# Record string truncation length
str_trunc_list.append(str_trunc_len)

# Delete any leading and trailing spaces
data_array[column_name][row_idx] = entry.strip()

return data_array
# Clean up indicated string truncation lengths, discarding 50
str_trunc_list = list(set(str_trunc_list))
str_trunc_list = [x for x in str_trunc_list if x != 50]
if len(str_trunc_list) > 1:
# If this condition has been satisfied, both '"' and
# '"' were present in the technology description strings
# in the imported text, which suggests a single truncation
# length might not work to match the strings in these data
text = ('Warning: undesired behavior might occur when '
'attempting to match technology characteristics '
'data (ktek) with service demand data (ksdout).')
print(text)

# Return the appropriate objects based on the return_str_len option
if return_str_len:
str_trunc_len_final = str_trunc_list[0] # Obtain standalone integer
return data_array, str_trunc_len_final
else:
return data_array


def main():
Expand Down
71 changes: 54 additions & 17 deletions com_mseg_tech.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,9 @@ def sd_data_selector(sd_data, sel, years):
# Identify each technology and performance level using the text
# in the description field since the technology type and vintage
# numeric codes are not well-matched to individual technology and
# performance levels
# performance levels; remove empty strings from the list
technames = list(np.unique(filtered['Description']))
technames = [x for x in technames if x != '']

# Set up numpy array to store restructured data, in which each row
# will correspond to a single technology
Expand Down Expand Up @@ -232,11 +233,22 @@ def single_tech_selector(tech_array, specific_name):
# 2 and three other numbers (i.e., 2009 or 2035)
tech_name = re.search('.+?(?=\s2[0-9]{3})', row['technology name'])

# If the regex returned a match, and the first group of the
# match (i.e., the part before the numeric year) is not the
# the same as the name passed to the function, remove the row
# If the technology name regex returned a match, check if there
# is a match for a linear fluorescent lighting technology; in
# either case (either the linear fluorescent or the more
# generic technology name regex), if the match is not the same
# as the name passed to the function, remove the row
if tech_name:
if tech_name.group(0) != specific_name:
# Test whether the technology name corresponds to a linear
# fluorescent lighting technology in the format 'T# F##',
# e.g., 'T8 F96', and if it does, extract just that string
# without any additional text (e.g., 'T8 F96 High Output')
lfl_tech_name = re.search('^(T[0-9] F[0-9]{2})',
tech_name.group(0))
if lfl_tech_name:
if lfl_tech_name.group(0) != specific_name:
rows_to_remove.append(idx)
elif tech_name.group(0) != specific_name:
rows_to_remove.append(idx)
# If there's no match, the technology might not have a year
# included as part of its name, but it nonetheless should be
Expand Down Expand Up @@ -351,15 +363,23 @@ def cost_perf_extractor(single_tech_array, sd_array, sd_names, years, flag):
# 44 characters since all of the string descriptions in the
# service demand data are limited to 44 characters; there
# is an exception for strings that have '-inch' in them,
# which should be matched to the first 43 characters since
# the substitution of '-inch' for '"' shortens the
# string by one character; finally remove any trailing
# spaces that might create text matching problems
# which should be matched to the first n characters, where
# n is either 43 or 48 characters depending on whether
# '-inch' was substituted for '"' or '"'; finally
# remove any trailing spaces that might create text
# matching problems
if re.search('-inch', name_from_ktek[:43]):
length = 43
length = UsefulVars().trunc_len
else:
length = 44
name_from_ktek = name_from_ktek[:length].strip()
# The number of characters to use for text matching
# determined when the service demand data description
# strings are cleaned up; the substitution of '-inch' for
# '"' will lengthen the string by four characters, thus the
# matching should be done with 48 characters; replacing
# '"' will reduce the length of the string by 1, thus
# the matching should be performed using 43 characters

# Find the matching row in service demand data by comparing
# the row technology name to sd_names and use that index to
Expand Down Expand Up @@ -534,11 +554,21 @@ def tech_names_extractor(tech_array):
# 2 and three other numbers (e.g., 2009 or 2035)
tech_name = re.search('.+?(?=\s2[0-9]{3})', row['technology name'])

# If the regex matched, add the matching text, which describes
# the technology without scenario-specific text like '2003
# installed base', to the technames list
# If the regex matched, check the matching text to see if it
# corresponds to a linear fluorescent lighting technology
# represented in the format 'T# F##', e.g., 'T8 F96'; if it does,
# extract from the match just the 'T# F##' string without any
# additional modifier text (e.g., 'T8 F96 High Output'); if not,
# add the text that matched originally, which describes the
# technology without scenario-specific text like '2003 installed
# base' to the technames list
if tech_name:
technames.append(tech_name.group(0))
lfl_tech_name = re.search('^(T[0-9] F[0-9]{2})',
tech_name.group(0))
if lfl_tech_name:
technames.append(lfl_tech_name.group(0))
else:
technames.append(tech_name.group(0))
# Else, if the technology name is not from a placeholder row,
# add the entire name text to the technames list
else:
Expand Down Expand Up @@ -1082,7 +1112,7 @@ def main():
# Import EIA AEO 'KSDOUT' service demand data
serv_dtypes = cm.dtype_array(cm.EIAData().serv_dmd)
serv_data = cm.data_import(cm.EIAData().serv_dmd, serv_dtypes)
serv_data = cm.str_cleaner(serv_data, 'Description')
serv_data, tval = cm.str_cleaner(serv_data, 'Description', True)

# Import EIA AEO 'KDBOUT' additional data file
catg_dtypes = cm.dtype_array(cm.EIAData().catg_dmd)
Expand All @@ -1097,6 +1127,10 @@ def main():
with open(handyvars.aeo_metadata, 'r') as metadata:
metajson = json.load(metadata)

# Assign available string truncation length value to UsefulVars
# class so that it is available for all class uses
UsefulVars.trunc_len = tval

# Define years vector using year data from metadata
years = list(range(metajson['min year'], metajson['max year'] + 1))

Expand All @@ -1114,13 +1148,16 @@ def main():
# (i.e., non-repeating) list of technologies that didn't have
# a match between the two data sets and thus were not added
# to the aggregated cost or performance data in the output JSON
# The technologies that appear in this list might vary from
# year to year.
if nmtn:
text = ('Warning: some technologies reported in the '
'technology characteristics data were not found to '
'have corresponding service demand data and were '
'thus excluded from the reported technology cost '
'and performance. Four performance levels for '
'solar water heaters are expected in this list.')
'and performance. These technologies are generally '
'absent from or have all zeros for their service '
'demand data.')
print(text)
for item in sorted(list(set(nmtn))):
print(' ' + item)
Expand Down
3 changes: 3 additions & 0 deletions com_mseg_tech_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -3947,6 +3947,7 @@ class CostAndPerformanceDataExtractionTest(CommonUnitTest):

# Test equality of the dicts of cost data generated for each technology
def test_cost_selection_and_conversion(self):
cmt.UsefulVars.trunc_len = 43
for idx, input_array in enumerate(self.reduced_tech_data):
cost_data, non_matched_names = cmt.cost_perf_extractor(
input_array,
Expand All @@ -3959,6 +3960,7 @@ def test_cost_selection_and_conversion(self):
# Test equality of the dicts of performance (i.e., energy efficiency)
# data generated for each technology
def test_performance_selection_and_conversion(self):
cmt.UsefulVars.trunc_len = 43
for idx, input_array in enumerate(self.reduced_tech_data):
perf_data, non_matched_names = cmt.cost_perf_extractor(
input_array,
Expand Down Expand Up @@ -4004,6 +4006,7 @@ class TechnologyDataHandlerTest(CommonUnitTest):
# specified in the third argument of the mseg_technology_handler
# function
def test_conversion_of_tech_and_sd_data_to_restructured_dict(self):
cmt.UsefulVars.trunc_len = 43
# Identify the unique microsegments in the data_to_select
# list of lists
unique_data_to_select = []
Expand Down
Loading

0 comments on commit 2c64c84

Please sign in to comment.