Change the index to 0 instead of -1 #11

ghadialhajj · 2023-09-12T11:36:33Z

self.sample_keys and hdf5_sample_keys both use -1 to index the output of the split method, so self.sample_keys ends up being a list of extensions, e.g. ".tsv" or a list of empty strings depending on whether self.metadata[self.sample_id_column] has the file names with or without the extension of each file, respectively. hdf5_sample_keys would also end up being a list of extensions.

Then, unfound_samples would not pick the error in case both lists have the extensions, and self.hdf5_inds would end up having the same index 0, because of how the .index() method works.

Consequently, the data loader would end up loading the same repertoire because sample_sequences_start_end would have the same start:end pairs, except that it will load them with different targets because the latter is sampled separately.

This was at least my experience running, e.g. the example_single_task_cnn.py file with the example dataset :)

Change inds to 0 instead of -1

8156cec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the index to 0 instead of -1 #11

Change the index to 0 instead of -1 #11

ghadialhajj commented Sep 12, 2023

Change the index to 0 instead of -1 #11

Are you sure you want to change the base?

Change the index to 0 instead of -1 #11

Conversation

ghadialhajj commented Sep 12, 2023