Changes to handle AEO 2018 residential data files

In ‘mseg.py’: -Modified number of footer lines to skip in ‘rsmlgt.txt’. -Added new columns for utility rebates and technology-specific choice weights in ‘rsmlgt.txt’. -Added “latin1” encoding argument to numpy genfromtxt command on ‘rsmlgt.txt’ import. In ‘mseg_techdata.py’: -Added new columns for equipment rebates in ‘rsmeqp.txt’. -Removed major fuel flag column in ‘rsclass.txt’. -Added new columns for utility rebates and technology-specific choice weights in ‘rsmlgt.txt’. -Modified number of footer lines to skip in ‘rsmlgt.txt’ and header lines to skip in ‘rsclass.txt’. -Added “latin1” encoding argument to numpy genfromtxt command on ‘rsmlgt.txt’ and ‘rsclass.txt’ imports. Break out all AEO residential MELs, minor naming changes The following changes are now reflected across the entire suite of Scout files (from AEO data updating modules through to analysis engine modules and documentation): -Break out the former ‘other MELs’ categories such that AEO stock and energy data are available for the individual technologies ‘coffee maker,’ ‘dehumidifier,’ ‘microwave,’ ‘pool heaters and pumps,’ ‘security system,’ ‘portable electric spas,’ ‘wine coolers,’ and ‘electric other’. -Change ‘other (grid electric)’ end use to ‘other’ and extend across all fuel types. -Add the ‘other appliances’ technology to the ‘other’ end use category for non-electric fuels. -Change the ‘non-specific’ secondary heating technology for electric and natural gas fuels to ‘secondary heater’; also change ‘secondary heating’ technology for all other fuels to ‘secondary heater’ (e.g., ‘secondary heating (wood)’ -> ‘secondary heater (wood)’). -Change all ampersands in technology names to ‘and’ for consistency (e.g., ‘fans & pumps’ -> ‘fans and pumps’). Additionally, stale data fields were cleared from the results of the ‘mseg_techdata.py’ routine, and all missing technology choice data now yields a zero value for the technology in question, instead of a dict full of ‘NA’ values. Close #202 Revise string cleaning and data handling for EIA input data files for commercial buildings to correctly read technology descriptions in the service demand data and match those strings to comparable descriptions in the technology characteristics (cost, performance, and lifetime) data. Support AEO 2018 commercial data Update commercial input data handling to support new miscellaneous electric load (MEL) types and update microsegments.json with these new MELs types. Update documentation to reflect the technology types available from the AEO 2018 data for commercial buildings. Fix lighting type string handling Fix specific problems present in the commercial lighting data from AEO and in the handling of those data. Combine 'SodiumVapor' and 'Sodium Vapor' lighting types together. Collapse linear fluorescent lighting types to simplified strings of the form 'TX FXX', e.g., 'T8 F28.' Eliminate the empty string ('') lighting technology associated with a handful of rows in the service demand data that have no 'Description' (and no service demand). Changes to technology names in ecm_prep The changes are needed to accommodate an expanded set of commercial MELs technology names, a more compact set of commercial lighting technology names, and revised technology names for commercial cooking. Another small modification was made to the routine that determines common date ranges (e.g., 2013-2050) across all raw EIA input files. Finally, the handling of missing residential consumer choice data in ‘ecm_prep.py’ was revised to reflect changes in the structure of the underlying choice dataset. Added complete AEO 2018 baseline files, default ECM modifications, and results Complete stock/energy and technology characteristics data are included, as are new heating/cooling totals, site-source conversions, and Consumer Price Index data. One ECM definition had to be modified slightly to reflect the updated ‘other’ end use name. All ECM definitions and results in the /web folder were updated to reflect these AEO 2018 data updates. Modified LED troffers example for AEO 2018 technologies
trynthink · Jun 27, 2018 · 2c64c84 · 2c64c84
1 parent 2cca7f6
commit 2c64c84
Show file tree

Hide file tree

Showing 26 changed files with 1,791,446 additions and 1,899,031 deletions.
diff --git a/com_mseg.py b/com_mseg.py
@@ -4,6 +4,7 @@
 import re
 import csv
 import json
+import io
 
 
 class EIAData(object):
@@ -55,7 +56,17 @@ class CommercialTranslationDicts(object):
         cdivdict (dict): Translation for census divisions.
         bldgtypedict (dict): Translation for commercial building types.
         endusedict (dict): Translation for commercial building end uses.
-        mels_techdict (dict): Translation for miscellaneous electric loads.
+        mels_techdict (dict): Translation for miscellaneous electric
+            loads (MELs). The numeric translation should be updated
+            each year based on the interpretation given in the AEO
+            commercial buildings microdata file. If there are
+            conspicuously missing MEL codes in the microdata, EIA
+            should be contacted to verify the translation between
+            numeric codes and descriptive names. Additionally, the
+            numeric codes in the end use column in KDBOUT.txt in the
+            rows labeled 'MiscElConsump' should be compared against
+            the codes in the microdata to see if any of the codes are
+            missing from KDBOUT.txt.
         fueldict (dict): Translation for fuel types.
         demand_typedict (dict): Translation for components of thermal load.
     """
@@ -83,7 +94,7 @@ def __init__(self):
                              'mercantile/service': 9,
                              'warehouse': 10,
                              'other': 11,
-                             'FIGURE THIS ONE OUT': 12
+                             'non-building': 12  # Applies to specific MELs
                              }
 
         self.endusedict = {'heating': 1,
@@ -108,10 +119,16 @@ def __init__(self):
                               'laundry': 8,
                               'lab fridges and freezers': 9,
                               'fume hoods': 10,
-                              'medical imaging': 11,
-                              'video displays': 15,
-                              'large video displays': 16,
-                              'municipal water services': 17
+                              'medical imaging': 12,
+                              'large video boards': 13,
+                              'IT equipment': 14,
+                              'office UPS': 15,
+                              'data center UPS': 16,
+                              'shredders': 17,
+                              'private branch exchanges': 18,
+                              'voice-over-IP telecom': 19,
+                              'water services': 20,  # non-building
+                              'telecom systems': 21  # non-building
                               }
 
         self.fueldict = {'electricity': 1,
@@ -291,6 +308,10 @@ def sd_mseg_percent(sd_array, sel, yrs):
         # summarized and returned from this function
         elif re.search('placeholder', row['Description']):
             rows_to_remove.append(idx)
+        # Else check to see if the description is an empty string,
+        # and if so, add it to the list of rows to remove
+        elif re.search('^(?![\s\S])', row['Description']):
+            rows_to_remove.append(idx)
         # Else check for a special case where the year in the
         # technology name sought by the tech_name regex didn't match
         # because the year in the name is partially truncated at
@@ -303,6 +324,17 @@ def sd_mseg_percent(sd_array, sel, yrs):
     # Delete the placeholder rows from the filtered array
     filtered = np.delete(filtered, rows_to_remove, 0)
 
+    # Special filtering for lighting to drop special modifier text
+    # in the descriptions of linear fluorescent bulb types (e.g.,
+    # replace 'T8 F32 Commodity' with 'T8 F32') now that year
+    # details have been removed
+    if sel[2] == CommercialTranslationDicts().endusedict['lighting']:
+        for idx, row in enumerate(filtered):
+            # Identify linear fluorescent types
+            tech_name = re.search('^(T[0-9] F[0-9]{2})', row['Description'])
+            if tech_name:
+                filtered['Description'][idx] = tech_name.group(0)
+
     # Because different technologies are sometimes coded with the same
     # technology type number (especially in lighting, where lighting
     # types are often differentiated by vintage and technology type
@@ -319,7 +351,7 @@ def sd_mseg_percent(sd_array, sel, yrs):
     tval = np.zeros((len(trunc_technames), len(yrs)))
 
     # Combine the data recorded for each unique technology
-    for idx, name in enumerate(trunc_technames):
+    for idx, name in enumerate(technames):
 
         # Extract entries for a given technology type number
         entries = filtered[filtered['Description'] == name]
@@ -711,6 +743,15 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):
 
     # Open the target CSV formatted data file
     with open(data_file_path) as thefile:
+        # For some cooking equipment descriptions in the service demand
+        # data, 11 inches is encoded as 11", which by default leaves
+        # the closing double-quote character in the description strings
+        # while removing the " that denoted inches; by inserting an
+        # escape character before the " denoting inches, the text will
+        # be handled correctly by csv.reader
+        if re.match('.*KSDOUT', re.escape(data_file_path)):
+            cont = thefile.read().replace('11"', '11\\"')
+            thefile = io.StringIO(cont)
 
         # This use of csv.reader assumes that the default setting of
         # quotechar '"' is appropriate; the skipinitialspace option
@@ -722,10 +763,12 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):
         # if they are encountered
         if '\0' in open(data_file_path).read():  # NULL bytes detected
             filecont = csv.reader((x.replace('\0', '') for x in thefile),
-                                  delimiter=delim_char, skipinitialspace=True)
+                                  delimiter=delim_char, skipinitialspace=True,
+                                  escapechar='\\')
         else:  # No NULL bytes, proceed normally
             filecont = csv.reader(thefile,
-                                  delimiter=delim_char, skipinitialspace=True)
+                                  delimiter=delim_char, skipinitialspace=True,
+                                  escapechar='\\')
 
         # Create list to be populated with tuples of each row of data
         # from the data file
@@ -776,7 +819,7 @@ def data_import(data_file_path, dtype_list, delim_char=',', hl=None, cols=[]):
         return final_struct
 
 
-def str_cleaner(data_array, column_name):
+def str_cleaner(data_array, column_name, return_str_len=False):
     """Clean up formatting of technology description strings in imported data.
 
     In the imported EIA data, the strings that describe the technology
@@ -789,9 +832,17 @@ def str_cleaner(data_array, column_name):
     Args:
         data_array (numpy.ndarray): A numpy structured array of imported data.
         column_name (str): The name of the column in data_array to edit.
+        return_str_len (bool): If true, this function returns an
+            additional integer used for string truncation.
 
     Returns:
         The input array with the strings in column_name revised.
+        If return_str_len is true, then the function also returns an
+        integer for the string length to use to truncate the cooking
+        technology strings from ktek (the technology cost, performance,
+        and lifetime data file) to match the length of the modified
+        technology strings in KSDOUT (the service demand data) when
+        combining those data.
     """
 
     def special_character_handler(text_string):
@@ -801,9 +852,13 @@ def special_character_handler(text_string):
             text_string (str): A string describing a particular technology.
 
         Returns:
-            The edited text string.
+            The edited text string and the string truncation length,
+            explained in the parent function docstring.
         """
 
+        # Replace 'SodiumVapor' with 'Sodium Vapor'
+        text_string = re.sub('SodiumVapor', 'Sodium Vapor', text_string)
+
         # Check to see if an HTML character reference ampersand or
         # double-quote, or standard double-quote character is in
         # the string
@@ -816,12 +871,20 @@ def special_character_handler(text_string):
         # use of the standalone double-quote character
         if html_ampersand_present:
             text_string = re.sub('&amp;', '&', text_string)
+            str_trunc_len = 50  # Not used in com_mseg_tech
         elif html_double_quote_present:
             text_string = re.sub('&quot;', '-inch', text_string)
+            str_trunc_len = 43
         elif double_quote_present:
             text_string = re.sub('\"', '-inch', text_string)
+            str_trunc_len = 48
+        else:
+            str_trunc_len = 50
 
-        return text_string
+        return text_string, str_trunc_len
+
+    # Store the indicated string truncation lengths in a list
+    str_trunc_list = []
 
     # Check for double quotes in the first entry in the specified column
     # and, assuming all entries in the column are the same, revise all
@@ -838,7 +901,10 @@ def special_character_handler(text_string):
 
             # Clean up strings with special characters to ensure that
             # these characters appear consistently across all imported data
-            entry = special_character_handler(entry)
+            entry, str_trunc_len = special_character_handler(entry)
+
+            # Record string truncation length
+            str_trunc_list.append(str_trunc_len)
 
             # Delete any newly "apparent" (no longer enclosed by the double
             # quotes) trailing or (unlikely) leading spaces and replace the
@@ -851,12 +917,33 @@ def special_character_handler(text_string):
 
             # Clean up strings with special characters to ensure that
             # these characters appear consistently across all imported data
-            entry = special_character_handler(entry)
+            entry, str_trunc_len = special_character_handler(entry)
+
+            # Record string truncation length
+            str_trunc_list.append(str_trunc_len)
 
             # Delete any leading and trailing spaces
             data_array[column_name][row_idx] = entry.strip()
 
-    return data_array
+    # Clean up indicated string truncation lengths, discarding 50
+    str_trunc_list = list(set(str_trunc_list))
+    str_trunc_list = [x for x in str_trunc_list if x != 50]
+    if len(str_trunc_list) > 1:
+        # If this condition has been satisfied, both '&quot;' and
+        # '"' were present in the technology description strings
+        # in the imported text, which suggests a single truncation
+        # length might not work to match the strings in these data
+        text = ('Warning: undesired behavior might occur when '
+                'attempting to match technology characteristics '
+                'data (ktek) with service demand data (ksdout).')
+        print(text)
+
+    # Return the appropriate objects based on the return_str_len option
+    if return_str_len:
+        str_trunc_len_final = str_trunc_list[0]  # Obtain standalone integer
+        return data_array, str_trunc_len_final
+    else:
+        return data_array
 
 
 def main():

diff --git a/com_mseg_tech.py b/com_mseg_tech.py
@@ -172,8 +172,9 @@ def sd_data_selector(sd_data, sel, years):
     # Identify each technology and performance level using the text
     # in the description field since the technology type and vintage
     # numeric codes are not well-matched to individual technology and
-    # performance levels
+    # performance levels; remove empty strings from the list
     technames = list(np.unique(filtered['Description']))
+    technames = [x for x in technames if x != '']
 
     # Set up numpy array to store restructured data, in which each row
     # will correspond to a single technology
@@ -232,11 +233,22 @@ def single_tech_selector(tech_array, specific_name):
         # 2 and three other numbers (i.e., 2009 or 2035)
         tech_name = re.search('.+?(?=\s2[0-9]{3})', row['technology name'])
 
-        # If the regex returned a match, and the first group of the
-        # match (i.e., the part before the numeric year) is not the
-        # the same as the name passed to the function, remove the row
+        # If the technology name regex returned a match, check if there
+        # is a match for a linear fluorescent lighting technology; in
+        # either case (either the linear fluorescent or the more
+        # generic technology name regex), if the match is not the same
+        # as the name passed to the function, remove the row
         if tech_name:
-            if tech_name.group(0) != specific_name:
+            # Test whether the technology name corresponds to a linear
+            # fluorescent lighting technology in the format 'T# F##',
+            # e.g., 'T8 F96', and if it does, extract just that string
+            # without any additional text (e.g., 'T8 F96 High Output')
+            lfl_tech_name = re.search('^(T[0-9] F[0-9]{2})',
+                                      tech_name.group(0))
+            if lfl_tech_name:
+                if lfl_tech_name.group(0) != specific_name:
+                    rows_to_remove.append(idx)
+            elif tech_name.group(0) != specific_name:
                 rows_to_remove.append(idx)
         # If there's no match, the technology might not have a year
         # included as part of its name, but it nonetheless should be
@@ -351,15 +363,23 @@ def cost_perf_extractor(single_tech_array, sd_array, sd_names, years, flag):
             # 44 characters since all of the string descriptions in the
             # service demand data are limited to 44 characters; there
             # is an exception for strings that have '-inch' in them,
-            # which should be matched to the first 43 characters since
-            # the substitution of '-inch' for '&quot;' shortens the
-            # string by one character; finally remove any trailing
-            # spaces that might create text matching problems
+            # which should be matched to the first n characters, where
+            # n is either 43 or 48 characters depending on whether
+            # '-inch' was substituted for '"' or '&quot;'; finally
+            # remove any trailing spaces that might create text
+            # matching problems
             if re.search('-inch', name_from_ktek[:43]):
-                length = 43
+                length = UsefulVars().trunc_len
             else:
                 length = 44
             name_from_ktek = name_from_ktek[:length].strip()
+            # The number of characters to use for text matching
+            # determined when the service demand data description
+            # strings are cleaned up; the substitution of '-inch' for
+            # '"' will lengthen the string by four characters, thus the
+            # matching should be done with 48 characters; replacing
+            # '&quot;' will reduce the length of the string by 1, thus
+            # the matching should be performed using 43 characters
 
             # Find the matching row in service demand data by comparing
             # the row technology name to sd_names and use that index to
@@ -534,11 +554,21 @@ def tech_names_extractor(tech_array):
         # 2 and three other numbers (e.g., 2009 or 2035)
         tech_name = re.search('.+?(?=\s2[0-9]{3})', row['technology name'])
 
-        # If the regex matched, add the matching text, which describes
-        # the technology without scenario-specific text like '2003
-        # installed base', to the technames list
+        # If the regex matched, check the matching text to see if it
+        # corresponds to a linear fluorescent lighting technology
+        # represented in the format 'T# F##', e.g., 'T8 F96'; if it does,
+        # extract from the match just the 'T# F##' string without any
+        # additional modifier text (e.g., 'T8 F96 High Output'); if not,
+        # add the text that matched originally, which describes the
+        # technology without scenario-specific text like '2003 installed
+        # base' to the technames list
         if tech_name:
-            technames.append(tech_name.group(0))
+            lfl_tech_name = re.search('^(T[0-9] F[0-9]{2})',
+                                      tech_name.group(0))
+            if lfl_tech_name:
+                technames.append(lfl_tech_name.group(0))
+            else:
+                technames.append(tech_name.group(0))
         # Else, if the technology name is not from a placeholder row,
         # add the entire name text to the technames list
         else:
@@ -1082,7 +1112,7 @@ def main():
     # Import EIA AEO 'KSDOUT' service demand data
     serv_dtypes = cm.dtype_array(cm.EIAData().serv_dmd)
     serv_data = cm.data_import(cm.EIAData().serv_dmd, serv_dtypes)
-    serv_data = cm.str_cleaner(serv_data, 'Description')
+    serv_data, tval = cm.str_cleaner(serv_data, 'Description', True)
 
     # Import EIA AEO 'KDBOUT' additional data file
     catg_dtypes = cm.dtype_array(cm.EIAData().catg_dmd)
@@ -1097,6 +1127,10 @@ def main():
     with open(handyvars.aeo_metadata, 'r') as metadata:
         metajson = json.load(metadata)
 
+    # Assign available string truncation length value to UsefulVars
+    # class so that it is available for all class uses
+    UsefulVars.trunc_len = tval
+
     # Define years vector using year data from metadata
     years = list(range(metajson['min year'], metajson['max year'] + 1))
 
@@ -1114,13 +1148,16 @@ def main():
             # (i.e., non-repeating) list of technologies that didn't have
             # a match between the two data sets and thus were not added
             # to the aggregated cost or performance data in the output JSON
+            # The technologies that appear in this list might vary from
+            # year to year.
             if nmtn:
                 text = ('Warning: some technologies reported in the '
                         'technology characteristics data were not found to '
                         'have corresponding service demand data and were '
                         'thus excluded from the reported technology cost '
-                        'and performance. Four performance levels for '
-                        'solar water heaters are expected in this list.')
+                        'and performance. These technologies are generally '
+                        'absent from or have all zeros for their service '
+                        'demand data.')
                 print(text)
                 for item in sorted(list(set(nmtn))):
                     print('   ' + item)

diff --git a/com_mseg_tech_test.py b/com_mseg_tech_test.py
@@ -3947,6 +3947,7 @@ class CostAndPerformanceDataExtractionTest(CommonUnitTest):
 
     # Test equality of the dicts of cost data generated for each technology
     def test_cost_selection_and_conversion(self):
+        cmt.UsefulVars.trunc_len = 43
         for idx, input_array in enumerate(self.reduced_tech_data):
             cost_data, non_matched_names = cmt.cost_perf_extractor(
                 input_array,
@@ -3959,6 +3960,7 @@ def test_cost_selection_and_conversion(self):
     # Test equality of the dicts of performance (i.e., energy efficiency)
     # data generated for each technology
     def test_performance_selection_and_conversion(self):
+        cmt.UsefulVars.trunc_len = 43
         for idx, input_array in enumerate(self.reduced_tech_data):
             perf_data, non_matched_names = cmt.cost_perf_extractor(
                 input_array,
@@ -4004,6 +4006,7 @@ class TechnologyDataHandlerTest(CommonUnitTest):
     # specified in the third argument of the mseg_technology_handler
     # function
     def test_conversion_of_tech_and_sd_data_to_restructured_dict(self):
+        cmt.UsefulVars.trunc_len = 43
         # Identify the unique microsegments in the data_to_select
         # list of lists
         unique_data_to_select = []