You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I have tried to download the scgpt embedding from the CxG census from 2024-07-01, and I am finding NA values in the embeddings. Can you please help me understand what's going wrong? Here is my code:
import cellxgene_census
from cellxgene_census.experimental import get_embedding, get_embedding_metadata_by_name
import numpy as np
import pandas as pd
#set the census version, experiment name, and embedding name
census_version = "2024-07-01"
experiment_name = "homo_sapiens"
embedding_name = 'scgpt'
#download census data
with cellxgene_census.open_soma(census_version=census_version) as census:
print('Loading Cell x Gene Census Metadata')
obs_df = cellxgene_census.get_obs(
census,
experiment_name,
value_filter="is_primary_data == True",
)
print(f'Loading Cell x Gene Census {embedding_name} Embeddings')
metadata = get_embedding_metadata_by_name(embedding_name, experiment_name, census_version=census_version)
embedding_uri = f"s3://cellxgene-contrib-public/contrib/cell-census/soma/{metadata['census_version']}/{metadata['id']}"
embedding = get_embedding(metadata["census_version"], embedding_uri, obs_df.soma_joinid.to_numpy())
This code produces embedding, which is a numpy array of size (44265932, 512). This matches the number of cells in obs_df, as expected.
However, 287765504 of those values are NA; more specifically there are 562,042 cells which contain NA for every feature. Why is this happening? Thanks in advance for your insight!
print(np.isnan(embedding).sum().sum()) #how many nans in total? output: 287765504 print(np.isnan(embedding).all(axis=1).sum()) #output: 562042 cells contain NA for all features
The text was updated successfully, but these errors were encountered:
Hi! I have tried to download the scgpt embedding from the CxG census from 2024-07-01, and I am finding NA values in the embeddings. Can you please help me understand what's going wrong? Here is my code:
This code produces
embedding
, which is a numpy array of size (44265932, 512). This matches the number of cells inobs_df
, as expected.However, 287765504 of those values are NA; more specifically there are 562,042 cells which contain NA for every feature. Why is this happening? Thanks in advance for your insight!
print(np.isnan(embedding).sum().sum()) #how many nans in total? output: 287765504
print(np.isnan(embedding).all(axis=1).sum()) #output: 562042 cells contain NA for all features
The text was updated successfully, but these errors were encountered: