CellXGene census 2024-07-01 NA values? #273

rpeys · 2024-11-24T20:57:01Z

Hi! I have tried to download the scgpt embedding from the CxG census from 2024-07-01, and I am finding NA values in the embeddings. Can you please help me understand what's going wrong? Here is my code:

import cellxgene_census
from cellxgene_census.experimental import get_embedding, get_embedding_metadata_by_name
import numpy as np
import pandas as pd

#set the census version, experiment name, and embedding name
census_version = "2024-07-01"
experiment_name = "homo_sapiens"
embedding_name = 'scgpt'

#download census data
with cellxgene_census.open_soma(census_version=census_version) as census:

    print('Loading Cell x Gene Census Metadata')
    obs_df = cellxgene_census.get_obs(
        census,
        experiment_name,
        value_filter="is_primary_data == True",
    )

    print(f'Loading Cell x Gene Census {embedding_name} Embeddings')
    metadata = get_embedding_metadata_by_name(embedding_name, experiment_name, census_version=census_version)
    embedding_uri = f"s3://cellxgene-contrib-public/contrib/cell-census/soma/{metadata['census_version']}/{metadata['id']}"
    embedding = get_embedding(metadata["census_version"], embedding_uri, obs_df.soma_joinid.to_numpy())

This code produces embedding, which is a numpy array of size (44265932, 512). This matches the number of cells in obs_df, as expected.

However, 287765504 of those values are NA; more specifically there are 562,042 cells which contain NA for every feature. Why is this happening? Thanks in advance for your insight!

print(np.isnan(embedding).sum().sum()) #how many nans in total? output: 287765504
print(np.isnan(embedding).all(axis=1).sum()) #output: 562042 cells contain NA for all features

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CellXGene census 2024-07-01 NA values? #273

CellXGene census 2024-07-01 NA values? #273

rpeys commented Nov 24, 2024 •

edited

Loading

CellXGene census 2024-07-01 NA values? #273

CellXGene census 2024-07-01 NA values? #273

Comments

rpeys commented Nov 24, 2024 • edited Loading

rpeys commented Nov 24, 2024 •

edited

Loading