Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does PHATE handle new data? #147

Open
Mr-Byun opened this issue Nov 9, 2024 · 0 comments
Open

How does PHATE handle new data? #147

Mr-Byun opened this issue Nov 9, 2024 · 0 comments
Labels

Comments

@Mr-Byun
Copy link

Mr-Byun commented Nov 9, 2024

I am trying to perform k-means classification on the potential distance of the query dataset.
I simply called the extend_to_data function on the query dataset to do so.
However, I don't think the function gives me the potential distance.

    def extend_to_data(self, data, **kwargs):
        """Build transition matrix from new data to the graph

        Creates a transition matrix such that `Y` can be approximated by
        a linear combination of landmarks. Any
        transformation of the landmarks can be trivially applied to `Y` by
        performing

        `transform_Y = transitions.dot(transform)`

        Parameters
        ----------

        Y: array-like, [n_samples_y, n_features]
            new data for which an affinity matrix is calculated
            to the existing data. `n_features` must match
            either the ambient or PCA dimensions

        Returns
        -------

        transitions : array-like, [n_samples_y, self.data.shape[0]]
            Transition matrix from `Y` to `self.data`
        """
        kernel = self.build_kernel_to_data(data, **kwargs)
        if sparse.issparse(kernel):
            pnm = sparse.hstack(
                [
                    sparse.csr_matrix(kernel[:, self.clusters == i].sum(axis=1))
                    for i in np.unique(self.clusters)
                ]
            )
        else:
            pnm = np.array(
                [
                    np.sum(kernel[:, self.clusters == i], axis=1).T
                    for i in np.unique(self.clusters)
                ]
            ).transpose()
        pnm = normalize(pnm, norm="l1", axis=1)
        return pnm

Rather, it gives me the transition matrix, which I think is the diffusion probability matrix (transitioned optimal_t times).

So, to transform the transition matrix to the informational distance, I copied from the _calculate_potential function:

        c = (1 - self.gamma) / 2
        self._diff_potential = ((diff_op_t) ** c) / c

My attempt of mapping a query data on the reference dataset.

phate.data <- Embeddings(reference.seurat, 'symphony')

phate.ref <- phate(
    phate.data,
    gamma = 0, knn = 10,
    ndim = 3, mds.solver = 'smacof', npca = NULL,
    knn.dist.method = 'euclidean', mds.dist.method = 'euclidean', seed = 333
)

reference.seurat[['phate']] <- CreateDimReducObject(embeddings=phate.ref$embedding, key='phate_', assay='RNA')
km <- kmeans(phate.ref$operator$diff_potential, centers = 7)
reference.seurat$phate.k <- as.character(km$cluster)

query_phate <- phate.ref$operator$transform(Embeddings(query.seurat, 'symphony'))
query.seurat[['phate']] <- CreateDimReducObject(embeddings=query_phate, key='phate_', assay='RNA')

query_diff_transform<- phate.ref$operator$graph$extend_to_data(Embeddings(query.seurat, 'symphony'))
query_diff_potential <- query_diff_transform^(0.5) / 0.5 # Because gamma = 0
query.seurat$phate.k <- clue::cl_predict(km, newdata=as.matrix(query_diff_potential), type='class_ids')

After merging reference.seurat and query.seurat, I visualized the phate dimensions and phate.k clusters.
The query.seurat points overlapped on the reference.seurat points, however, the phate.k position was a little off.

Reference:
image

Query:
image

  1. Did I make a mistake? Also,
  2. is there a direct way to obtain the potential distance matrix of newdata (query)?, or
  3. is reference-based mapping with PHATE just not feasible?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant