Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: Unlabeled multi-dimensional array cannot be used for indexing: samples #720

Closed
alimanfoo opened this issue Jan 31, 2025 · 2 comments
Labels

Comments

@alimanfoo
Copy link
Member

This code run on colab:

sample_sets = ["AG1000G-MW"]
sample_query = "taxon = 'arabiensis'"
cyp6aap_region = "2R:28,480,000-28,510,000"
df_cyp6aap_cnv = ag3.gene_cnv_frequencies(
    region=cyp6aap_region,
    cohorts="admin2_year",
    sample_sets=sample_sets,
    sample_query=sample_query,
    sample_query_options=dict(engine="python"),
)
df_cyp6aap_cnv

...generates an exception:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-13-c966e3aea42c>](https://localhost:8080/#) in <cell line: 0>()
      2 sample_query = "taxon = 'arabiensis'"
      3 cyp6aap_region = "2R:28,480,000-28,510,000"
----> 4 df_cyp6aap_cnv = ag3.gene_cnv_frequencies(
      5     region=cyp6aap_region,
      6     cohorts="admin2_year",

12 frames
[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in gene_cnv_frequencies(***failed resolving arguments***)
    217         debug("access and concatenate data from regions")
    218         df = pd.concat(
--> 219             [
    220                 self._gene_cnv_frequencies(
    221                     region=r,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in <listcomp>(.0)
    218         df = pd.concat(
    219             [
--> 220                 self._gene_cnv_frequencies(
    221                     region=r,
    222                     cohorts=cohorts,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in _gene_cnv_frequencies(self, region, cohorts, sample_query, sample_query_options, min_cohort_size, sample_sets, drop_invariant, max_coverage_variance, include_counts, chunks, inline_array)
    263 
    264         debug("get gene copy number data")
--> 265         ds_cnv = self.gene_cnv(
    266             region=region,
    267             sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in gene_cnv(***failed resolving arguments***)
     61 
     62         ds = simple_xarray_concat(
---> 63             [
     64                 self._gene_cnv(
     65                     region=r,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in <listcomp>(.0)
     62         ds = simple_xarray_concat(
     63             [
---> 64                 self._gene_cnv(
     65                     region=r,
     66                     sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in _gene_cnv(self, region, sample_sets, sample_query, sample_query_options, max_coverage_variance, chunks, inline_array)
    105 
    106         # Access HMM data.
--> 107         ds_hmm = self.cnv_hmm(
    108             region=cnv_region,
    109             sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_data.py](https://localhost:8080/#) in cnv_hmm(***failed resolving arguments***)
    262                     raise ValueError(f"No samples found for query {sample_query!r}")
    263 
--> 264                 ds = ds.isel(samples=loc_query_samples)
    265 
    266             debug("handle coverage variance filter")

[/usr/local/lib/python3.11/dist-packages/xarray/core/dataset.py](https://localhost:8080/#) in isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   3087         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
   3088         if any(is_fancy_indexer(idx) for idx in indexers.values()):
-> 3089             return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
   3090 
   3091         # Much faster algorithm for when all indexers are ints, slices, one-dimensional

[/usr/local/lib/python3.11/dist-packages/xarray/core/dataset.py](https://localhost:8080/#) in _isel_fancy(self, indexers, drop, missing_dims)
   3130         missing_dims: ErrorOptionsWithWarn = "raise",
   3131     ) -> Self:
-> 3132         valid_indexers = dict(self._validate_indexers(indexers, missing_dims))
   3133 
   3134         variables: dict[Hashable, Variable] = {}

[/usr/local/lib/python3.11/dist-packages/xarray/core/dataset.py](https://localhost:8080/#) in _validate_indexers(self, indexers, missing_dims)
   2910 
   2911                 if v.ndim > 1:
-> 2912                     raise IndexError(
   2913                         "Unlabeled multi-dimensional array cannot be "
   2914                         f"used for indexing: {k}"

IndexError: Unlabeled multi-dimensional array cannot be used for indexing: samples

Versions:

Image

@ahernank
Copy link
Collaborator

I think this one might just be the == in the query (?)

@alimanfoo
Copy link
Member Author

Thanks @ahernank, yep this all goes away if we use == in the query, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants