Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Valdrada, the trigger simulation city #795

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Aretno
Copy link
Collaborator

@Aretno Aretno commented May 31, 2021

Adds valdrada, the trigger which simulates the trigger in the FPGA.

This city takes rwf and produces a dataframe with the simulated trigger information. It takes a configurable set of parameters that imitate the detector acquisition ones and their naming convention. It provides all possible trigger candidates above a given width and flags them as valid or not based on the configuration.

Details of the output information are in the city main file but basically consist on the values used for triggering in the FPGA. Aside from the trigger implementation itself, a modified BLR algorithm that imitates the behavior of the one in the FPGA has been added.

This notebook ilustrates the flow of Valdrada and shows how the less immediate parameters impact the trigger results.
https://github.com/Aretno/Notebooks/blob/master/TriggerSim/Valdrada.ipynb

The implementation has been based on extensive discussion with Raúl and has been throughly tested prior to the PR. This was done jointly with Raúl, evaluating the FPGA output he obtained with the output of this implementation. Full agreement was found between the fpga simulation and the python implementation. A discrepance of 0.16% in the total number of valid triggers was found between already acquired data and data passing the trigger simulation (done for several runs). This discrepance also appeared in the FPGA simulation which is a copy-paste of the existing code. Such discrepance was traced to the high pass filter, which starts with a different value in the online acquisition (depends on previous time bins, which we do not stored) and yielded a different starting value for the coefficients. This could cause an ADC deviation which modify the overall result in edge cases.

Moreover, the tool has already been used for extensive trigger analysis with no visible deviations from the expected behavior.

Aretno added 9 commits May 28, 2021 11:15
A BLR function is added that computes the baseline as a moving
average. It applies the high-pass filter after substracting such
baseline (and thus it's done within the loop) and applies
deconvolution. As happens in the FPGA, the returned signal is equal to
the raw signal whenever the BLR is not active.
These functions extract the trigger parameters and check for
coincidences between trigger candidates.
Add class that contains the simualted trigger info table.
Adds the writer for the simulated trigger dataframe.
Adds valdrada, this city takes rwf and produces a dataframe with the
simulated trigger information. It takes a configurable set of
parameters that imitate the detector acquisition ones and their naming
convention. It provides all possible trigger candidates above a given
width and flags them as valid or not based on the configuration.
Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code in depth, but I've cached a few technical details already.

This PR is algorithmically complex and requires knowing many non-trivial details. Who else besides you is familiar with this? It would be great if this person could review the algorithmic side of things at least.

Comment on lines 707 to 708
<<<<<<< HEAD
=======
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved conflict! Fixed in the last commit, but shouldn't be here anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how to remove this from the history.

Comment on lines 21 to 29
elements = []
for _ in range(6):
elements.append(draw(lists(integers(min_value=0, max_value=np.iinfo(np.uint32).max),
min_size=size, max_size=size)))
for _ in range(4):
elements.append(draw(lists(booleans(), min_size=size, max_size=size)))
for _ in range(5):
elements.append(draw(lists(integers(min_value=np.iinfo(np.int32).min, max_value=np.iinfo(np.int32).max),
min_size=size, max_size=size)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these numbers have specific meanings? Maybe it's better to give them names lke
a_meaningful_name = lists(integers(min_value=0, max_value=np.iinfo(np.uint32).max) and then draw(a_meaninful_name)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that later these are assigned to each column in col_names. I think in terms of self-documentation it's better for the composite to draw each value individually, but this might be too slow (haven't checked), so at least you should leave a comment annotating the columns they correspond to.

I think I would also transpose the data drawing: instead of "for each column generate all rows" -> "for each row generate all columns". This would be more natural if we had a composite that generates a single row and then another one that generates a list of rows of a given size. If this doesn't affect performance, I would go for it.


assert np.all([tname == cname for tname, cname in zip(trigger_dst.columns.values, col_names)])

# Check values stored, the if is needed because the valid_all condition is done in the writer, not on the input trigger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writer shouldn't manipulate the data. Why not add that column to the input instead?

Comment on lines 33 to 49
@fixture(scope="session")
def double_square_wf():
# Fake signal with two-nearby square pulses and a smaller additional one at the beginning.
n_baseline = 64000
base_window = 1024
thr = 5
wf = np.zeros(n_baseline)
start1 = np.random.randint(2*base_window, n_baseline // 2)
length = np.random.randint(n_baseline // 50, n_baseline // 4)
stop1 = start1 + length
start2 = stop1 + 10
stop2 = start2 + length

wf[start1:stop1] = np.full(length, np.random.randint(thr+1, 50))
wf[start2:stop2] = np.full(length, np.random.randint(thr+1, 50))
wf[1500 : 1520] = np.full(20 , np.random.randint(thr+1, 50)) # Extra pulse to check discard_width
return wf, thr, length, start1, start2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never remember how to handle this, but using random numbers is something we should avoid. At the very least we should fix the seed for the tests or fixtures that use random numbers to produce data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a composite now, I guess it should be ok.

Comment on lines 101 to 102
for i, element in enumerate(elements):
assert element == triggers[0][i]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not elements == triggers[0]?

Comment on lines 82 to 88
multi_candidate = ((t1[4] >= multi_w ) &
(t1[3] > multi_q ) &
(t1[2] <= t0[2] + t0[4] + multi_t+2)) # Start of trigger + peak width + time window + FPGA delay (2)
if not multi_candidate:
continue
# In case the later peak ends after the multipeak protection window ends
if (t1[2]+t1[4]) > (t0[2] + t0[4] + multi_t+2): # Event partially contained in the multipeak window
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these magic numbers have meaning, so again, a namedtuple would be better.

Comment on lines 154 to 171
# Order by time, index so we can use this to order both times and ids
order_idx = np.lexsort((ids, times))
order_time = times[order_idx]
pmts = np.full(ids.shape, -1)
# Calculate all time differences between each trigger and all later ones
time_diffs = np.array([order_time[i+1:] - t for i, t in enumerate(order_time)])
# Since pmts are ordered by time, closest pmt will be the next one in the array.
pmts[:-1] = ids[order_idx][1:]
# Shortest time is the first value of time_diff for each trigger
shortest = np.array([t[0] if len(t)>0 else -1 for t in time_diffs])
# Count how many triggers are within the coincidince window of each trigger
n_coinc = np.array([len(t[t<coinc_window])+1 for t in time_diffs])
# Undo the original ordering
old_order = np.argsort(order_idx)
# Assign the coincidence results to the trigger array
triggers[valid_sel, -3] = n_coinc [old_order]
triggers[valid_sel, -2] = shortest[old_order]
triggers[valid_sel, -1] = pmts [old_order]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty lines in between blocks to make it a bit more readable.

order_time = times[order_idx]
pmts = np.full(ids.shape, -1)
# Calculate all time differences between each trigger and all later ones
time_diffs = np.array([order_time[i+1:] - t for i, t in enumerate(order_time)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting it to an array is misleading. This would be a 1D array of type object each object being a 1D array of different lengths. Better leave it as a list.

row["valid_w" ] = valid_w
row["valid_h" ] = valid_h
row["valid_peak" ] = valid_peak
row["valid_all" ] = (valid_q + valid_w + valid_h + valid_peak) == 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the writer shouldn't need to know anything about what it is writing. you may also want to consider creating a dataframe and saving it using the dataframe writer?

also, since these are booleans, it's more appropriate to use all.

invisible_cities/conftest.py Show resolved Hide resolved
@Aretno
Copy link
Collaborator Author

Aretno commented Jun 22, 2021

Some things were answered in the comments of the reviewer, anothers talked peer to peer, but I think all the issues pointed out so far in the review have been taken into account.

Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ignoring cosmetics for now.

- q_min, q_max : Range for the integrated charge of the peak (q_min < q < q_max).
In ADC counts.
- time_min, time_max : Range for the peak width (time_min <= width < time_max).
In time mus.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In time mus.
In mus.

valdrada
-----------------------------------------------------------------------

This city finds simulates the trigger procedure at the FPGA over PMT waveforms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This city finds simulates the trigger procedure at the FPGA over PMT waveforms.
This city simulates the trigger procedure at the FPGA over PMT waveforms.


A number of general configuration parameters are needed (input as a dict, trigger_config):
- coincidence_window : Time window (in time bins) in which valid trigger coincidences are counted
- discard_width : Any trigger with width less than this parameter is discarded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units?

Comment on lines +105 to +110
if trigger_config['multipeak'] is not None:
trigger_config['multipeak']['time_min' ] /= 25*units.ns
trigger_config['multipeak']['time_after'] /= 25*units.ns
for k in channel_config.keys():
channel_config[k]['time_min'] /= 25*units.ns
channel_config[k]['time_max'] /= 25*units.ns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't these be forced to be ints?

assert_tables_equality(got, expected)


def test_valdrada_exact_result(ICDATADIR, trigger_config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

@mark.parametrize("multipeak", (True, False))
def test_valdrada_exact_result(ICDATADIR, trigger_config, multipeak)
    ...
    if not multipeak:
        conf['trigger_config']['multipeak'] = None

to combine both tests? You would need to add true_out to the parametrize as well, but it's worth it.

else:
new_val = False
if not new_val: break
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=all([triggers[i].valid_all, new_val]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=all([triggers[i].valid_all, new_val]))
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=triggers[i].valid_all and new_val)

Comment on lines +115 to +117
def retrieve_trigger_information(channel_config : Dict[int, int]
,trigger_config : Dict
) -> Callable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a city component, but maybe it just looks like one. I leave to you the decision of moving it to components.

Comment on lines +20 to +23
channel_dict = {'q_min' : 5000, 'q_max' : 50000,
'time_min' : 2000, 'time_max': 10000,
'baseline_dev' : 5, 'amp_max' : 1000,
'pulse_valid_ext': 15}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units

def trigger_time_and_validity(draw):
# Set of trigger times an validity flags, to test trigger coincidence
size = draw (integers(min_value=1, max_value=100))
flag = lists(integers(min_value=0, max_value=1 ), min_size=4, max_size=4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be booleans?

top[k] = ped/base_window

return np.asarray(signal_r, dtype='int16'), np.asarray(top, dtype='int16')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to clip the result before casting it to an int. Otherwise, 33000 would become -32536.
The limits for int16 are -32768 and +32767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants