Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
self.sample_keys
andhdf5_sample_keys
both use -1 to index the output of the split method, soself.sample_keys
ends up being a list of extensions, e.g. ".tsv" or a list of empty strings depending on whetherself.metadata[self.sample_id_column]
has the file names with or without the extension of each file, respectively.hdf5_sample_keys
would also end up being a list of extensions.Then,
unfound_samples
would not pick the error in case both lists have the extensions, andself.hdf5_inds
would end up having the same index0
, because of how the.index()
method works.Consequently, the data loader would end up loading the same repertoire because
sample_sequences_start_end
would have the same start:end pairs, except that it will load them with different targets because the latter is sampled separately.This was at least my experience running, e.g. the
example_single_task_cnn.py
file with the example dataset :)