diff --git a/docs/lst_analysis_workflow.rst b/docs/lst_analysis_workflow.rst index 13fd3484e9..52667c4fd0 100644 --- a/docs/lst_analysis_workflow.rst +++ b/docs/lst_analysis_workflow.rst @@ -51,6 +51,17 @@ Files and configuration The DL1 files to use obviously depend on the source you want to analyse. Unless you have a good reason, the latest version of the DL1 files should be used. +Selection of the real-data DL1 files +------------------------------------ + +The selection of the DL1 files (run-wise, i.e. those produced by lstosa by merging the subrun-wise DL1 files) for a +specific analysis (i.e., a given source and time period) can be performed using the notebook +``cta_lstchain/notebooks/data_quality.ipynb``. The selection also takes into account the quality of the data, mostly in +terms of atmospheric conditions - evaluated using the rate of cosmic-ray showers as a proxy. Data taken under poor +conditions will not be included in the list of selected runs. Instructions and further details can be found inside the +notebook. + + RF models --------- @@ -62,26 +73,58 @@ The RF models are stored in the following directory: ``/fefs/aswg/data/models/...`` -Tuning of DL1 files and RF models ---------------------------------- +Tuning of MC DL1 files and RF models +------------------------------------ -In case of high NSB in the data, it is possible to tune the DL1 files and the RF models to improve the performance of the analysis. -This is done by changing the `config` file of the RF models and producing new DL1 files and training new RF models. -To produce a config tuned to the data you want to analyse, you may run ``lstchain_tune_nsb`` function that will produce a ``tuned_nsb_config.json`` file. +The default MC production is generated with a level of noise in the images which corresponds to the level of diffuse +night-sky background ("NSB") in a "dark" field of view (i.e. for observations with moon below the horizon, at not-too-low +galactic latitudes and not affected by other sources of noise, like the zodiacal light). In general, observations of +**extragalactic** sources in dark conditions can be properly analyzed with the default MC (i.e. with the standard RF models). -To request a new production of RF models, you can open a pull-request on the lstmcpipe repository, producing a complete MC config, using: +The median of the standard deviation of the pixel charges recorded in interleaved pedestal events (in which a camera +image is recorded in absence of a physics trigger) is a good measure of the NSB level in a given data run. This is computed +by the data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above). For data with an NSB level +significantly higher than the "dark field" one, it is possible to tune (increase) the noise in the MC files, and produce +from them RF models (and "test MC" for computing instrument response functions) which improve the performance of the +analysis (relative to using the default, low-NSB MC). + +This is done by changing the configuration file for the MC processing, producing new DL1(b) files, and training new RF models. +To produce a config tuned to the data you want to analyse, you first have to obtain the standard analysis configuration +file (for MC) for the desired version of lstchain (= the version with which the real DL1 files you will use were produced). +This can be done with the script :py:obj:`~lstchain.scripts.lstchain_dump_config`: .. code-block:: - lstchain-dump-config --mc --update-with tuned_nsb_config.json --output-file PATH_TO_OUTPUT_FILE + lstchain_dump_config --mc --output-file standard_lstchain_config.json + +Now you have to update the file with the parameters needed to increase the NSB level. For this you need a simtel.gz MC +file from the desired production (any will do, it can be either a gamma or a proton file), and a "typical" subrun DL1 file +from your **real data** sample. "Typical" means one in which the NSB level is close to the median value for the sample +to be analyzed. The data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above) provides a list of +a few such subruns for your selected sample - you can use any of them. To update the config file you have to use the +script :py:obj:`~lstchain.scripts.lstchain_tune_nsb` , e.g. : + +.. code-block:: + + lstchain_tune_nsb.py --config standard_lstchain_config.json \ + --input-mc .../simtel_corsika_theta_6.000_az_180.000_run10.simtel.gz \ + --input-data .../dl1_LST-1.Run10032.0069.h5 \ + -o tuned_nsb_lstchain_config.json + +To request a **new production of RF models**, you can open a pull-request on the lstmcpipe repository, providing +the .json configuration file produced following the steps above. + + +Keeping track of lstchain configurations +---------------------------------------- +The lstchain configuration file used to process the simulations and produce the RF models of a given MC production is +provided in the lstmcpipe repository, as well as in the models directory. -lstchain config ---------------- +It is important that the software version, and the configuration used for processing real data and MC are the same. For a +given lstchain version, this should be guaranteed by following the procedure above which makes use of +:py:obj:`~lstchain.scripts.lstchain_dump_config`. -The lstchain config used to produce the RF models of a production is provided in the lstmcpipe repository, as well as in the models directory. -It is a good idea to use the same config for the data analysis. -You can also produce a config using `lstchain-dump-config`. DL3/IRF config files -------------------- diff --git a/docs/lstchain_api/datachecks/index.rst b/docs/lstchain_api/datachecks/index.rst index 068b8150b6..31e17d9ba6 100644 --- a/docs/lstchain_api/datachecks/index.rst +++ b/docs/lstchain_api/datachecks/index.rst @@ -9,7 +9,93 @@ Datachecks (`datachecks`) Introduction ============ -Module containing functions producing the LST datachecks. Currently reaching DL1 level. +Module containing functions for checking the quality of the LST data. + +DL1 data checks +=============== + +Currently the checks are done at the DL1 level. +The DL1 datacheck files are produced by running the following scripts sequentially: + +* :py:obj:`lstchain.scripts.lstchain_check_dl1` + + This takes as input a DL1 file (including DL1a information, i.e. camera images & times) from a data subrun, e.g.: + + .. code-block:: bash + + lstchain_check_dl1 --input-file dl1_LST-1.1.Run14619.0000.h5 --output-dir OUTPUT_DIR --omit-pdf + + + The script produces a data check file for the subrun, ``datacheck_dl1_LST-1.Run14619.0000.h5`` which contains many + quantities that can be used to judge the quality of the data (see class :py:obj:`~lstchain.datachecks.containers.DL1DataCheckContainer`) + +| + +* :py:obj:`lstchain.scripts.lstchain_check_dl1` + + The same script is run again, but now providing as input the **subrun-wise datacheck files** produced earlier (all subruns + of a given run must be provided). It also needs to know where the subrun-wise ``muons_LST-1.*.fits files`` (produced in + the R0 to DL1 analysis step) which contain the muon ring information are stored ("MUONS_DIR"): + + .. code-block:: bash + + lstchain_check_dl1 --input-file "datacheck_dl1_LST-1.Run14619.*.h5" --output-dir OUTPUT_DIR --muons-dir MUONS_DIR + + + The output is now a data check file for the whole run, ``datacheck_dl1_LST-1.Run14619.h5`` which contains all the + information from the subrun-wise files. The script also produces a .pdf file ``datacheck_dl1_LST-1.Run14619.pdf`` with + various plots of the quantities stored in the DL1DataCheckContainer objects, plus others obtained from the muon ring + analysis. Note that the muon ring information is not propagated to the run-wise datacheck files, it is just used for + the plotting. + +| + +* :py:obj:`lstchain.scripts.lstchain_longterm_dl1_check` + + This script merges the run-wise datacheck files of (typically) one night, stored in INPUT_DIR, and produces **a + single .h5 file for the whole night** as output (e.g. ``DL1_datacheck_20230920.h5``). The "longterm" in the script + name is a bit of an overstatement - in principle it can be run over data from many nights, but the output html (see + below) becomes too heavy and some of the interactive features work poorly. + + .. code-block:: bash + + lstchain_longterm_dl1_check --input-dir INPUT_DIR --muons-dir MUONS_DIR --output-file DL1_datacheck_20230920.h5 --batch + + The output .h5 file contains a (run-wise) summarized version of the information in the input files, including the muon + ring .fits files. It also creates an .html file (e.g. ``DL1_datacheck_20230920.html``) which can be opened with any + web browser and which contains various interactive plots which allow to make a quick check of the data of a night. See + an example of the .html file `here `_ + (password protected). + +| + +.. Heading below be replaced by this once script is merged! :py:obj:`lstchain.scripts.lstchain_cherenkov_transparency` + +* ``lstchain_cherenkov_transparency`` + + + This script analyzes the image intensity histograms (one per subrun) stored in the **run-wise** datacheck files (which + must exist in INPUT_DIR) + + .. code-block:: bash + + lstchain_cherenkov_transparency --update-datacheck-file DL1_datacheck_20230920.h5 --input-dir INPUT_DIR + + The script updates the **night-wise** datacheck .h5 file ``DL1_datacheck_20230920.h5`` with a new table (with one entry + per subrun) containing parameters related to the image intensity spectra for cosmic ray events (i.e., a + Cherenkov-transparency - like approach, see e.g. https://arxiv.org/abs/1310.1639). + + + + +Using the datacheck files for selecting good-quality data +========================================================= + +The night-wise datacheck .h5 files, ``DL1_datacheck_YYYYMMDD.h5`` can be used to select a subsample of good quality data +from a large sample of observations. The files are relatively light, 6 MB per night in average. A large sample of them +can be processed with the notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (instructions can be found inside the +notebook) + Reference/API ============= diff --git a/lstchain/image/modifier.py b/lstchain/image/modifier.py index 59b0622933..84f64b8078 100644 --- a/lstchain/image/modifier.py +++ b/lstchain/image/modifier.py @@ -194,6 +194,8 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename, '/dl1/event/telescope/monitoring/calibration') data_dl1_pedestal = read_table(data_dl1_filename, '/dl1/event/telescope/monitoring/pedestal') + data_dl1_flatfield = read_table(data_dl1_filename, + '/dl1/event/telescope/monitoring/flatfield') data_dl1_parameters = read_table(data_dl1_filename, '/dl1/event/telescope/parameters/LST_LSTCam') data_dl1_image = read_table(data_dl1_filename, @@ -203,8 +205,9 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename, # Locate pixels with HG declared unusable either in original calibration or # in interleaved events: bad_pixels = unusable[0][0] # original calibration - for tf in unusable[1:][0]: # calibrations with interleaveds - bad_pixels = np.logical_or(bad_pixels, tf) + if unusable.shape[0] > 1: + for tf in unusable[1:][0]: # calibrations with interleaveds + bad_pixels = np.logical_or(bad_pixels, tf) good_pixels = ~bad_pixels # First index: 1,2,... = values from interleaveds (0 is for original @@ -214,6 +217,29 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename, # HG adc to pe conversion factors from interleaved calibrations: data_HG_dc_to_pe = data_dl1_calibration['dc_to_pe'][:, 0, :] + + if data_dl1_flatfield['charge_mean'].shape[0] < 2: + logging.error('\nCould not find interleaved FF calibrations in ' + 'monitoring table!') + return None, None, None + + if data_dl1_pedestal['charge_std'].shape[0] < 2 : + logging.error('\nCould not find interleaved pedestal info in ' + 'monitoring table!') + return None, None, None + + # Mean HG charge in interleaved FF events, to spot possible issues: + data_HG_FF_mean = data_dl1_flatfield['charge_mean'][1:, 0, :] + dummy = [] + # indices which connect each FF calculation to a given calibration: + calibration_id = data_dl1_flatfield['calibration_id'][1:] + + for i, x in enumerate(data_HG_FF_mean[:, ]): + dummy.append(x * data_HG_dc_to_pe[calibration_id[i],]) + dummy = np.array(dummy) + # Average for all interleaved calibrations (in case there are more than one) + data_HG_FF_mean_pe = np.mean(dummy, axis=0) # one value per pixel + # Pixel-wise pedestal standard deviation (for an unbiased extractor), # in adc counts: data_HG_ped_std = data_dl1_pedestal['charge_std'][1:, 0, :] @@ -224,20 +250,24 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename, for i, x in enumerate(data_HG_ped_std[:, ]): dummy.append(x * data_HG_dc_to_pe[calibration_id[i],]) dummy = np.array(dummy) - # Average for all interleaved calibrations (in case there are more than one) data_HG_ped_std_pe = np.mean(dummy, axis=0) # one value per pixel # Identify noisy pixels, likely containing stars - we want to adjust MC to # the average diffuse NSB across the camera - data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe) - data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe) - log.info(f'Real data: median across camera of good pixels\' pedestal std ' + data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe[good_pixels]) + data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe[good_pixels]) + log.info('\nReal data:') + log.info(f' Number of bad pixels (from calibration): {bad_pixels.sum()}') + log.info(f' Median of FF pixel charge: ' + f'{np.nanmedian(data_HG_FF_mean_pe[good_pixels]):.3f} p.e.') + log.info(f' Median across camera of good pixels\' pedestal std ' f'{data_median_std_ped_pe:.3f} p.e.') brightness_limit = data_median_std_ped_pe + 3 * data_std_std_ped_pe too_bright_pixels = (data_HG_ped_std_pe > brightness_limit) - log.info(f'Number of pixels beyond 3 std dev of median: ' - f'{too_bright_pixels.sum()}, (above {brightness_limit:.2f} p.e.)') + log.info(f' Number of pixels beyond 3 std dev of median: ' + f' {too_bright_pixels.sum()}, (above {brightness_limit:.2f} ' + f'p.e.)') ped_mask = data_dl1_parameters['event_type'] == 2 # The charges in the images below are obtained with the extractor for diff --git a/lstchain/scripts/lstchain_tune_nsb.py b/lstchain/scripts/lstchain_tune_nsb.py index 8ab069d870..5f512cdd40 100644 --- a/lstchain/scripts/lstchain_tune_nsb.py +++ b/lstchain/scripts/lstchain_tune_nsb.py @@ -79,6 +79,9 @@ def main(): a, b, c = calculate_noise_parameters(args.input_mc, args.input_data, args.config) + if a is None: + logging.error('Could not compute NSB tuning parameters. Exiting!') + sys.exit(1) dict_nsb = {"increase_nsb": True, "extra_noise_in_dim_pixels": round(a, 3), @@ -93,7 +96,11 @@ def main(): if args.output_file: cfg = read_configuration_file(args.config) - cfg['image_modifier'].update(dict_nsb) + if 'image_modifier' in cfg: + cfg['image_modifier'].update(dict_nsb) + else: + cfg['image_modifier'] = dict_nsb + dump_config(cfg, args.output_file, overwrite=args.overwrite)