introduction.tex

\section{\textcolor{red}{Introduction}}
\label{sec:introduction}
\change{
Studying the functional and anatomical connectivity of the human brain has given us a deeper understanding of the characteristics of the brain, from microscale connectivity between single neurons to macroscale connectivity between regions of interest in whole brain images. One of the routinely employed approaches for characterizing macroscale connectivity utilizes measures of statistical association (such as Pearson’s correlation) obtained from resting-state functional magnetic resonance imaging (rs-fMRI) times series corresponding to specific ROIs in the human brain. The connectivity matrix thus obtained is an algebraic representation of the weighted brain network, which shows the relationship between all pairs of nodes. } 

\change{Functional connectivity obtained from rs-fMRI has been shown to be extremely sensitive to mental health disorders~\cite{fornito2010can} as well as predictive of behavior in healthy individuals~\cite{miller2016multimodal}. This has kindled interest in using functional connectivity networks (FCNs) as potential biomarkers of mental disorders. The sensitivity and specificity of these biomarkers and their generalizability to the general population seem to increase with sample size. However, acquiring data from larger samples at any given site can be economically and logistically prohibitive. Therefore, there has been a recent impetus toward post-hoc aggregation of data acquired at different sites to form larger datasets. Examples include the Autism Brain Imaging Data Exchange (ABIDE~\cite{di2014autism}) and ADHD-200~\cite{bellec2017neuro}. In such large publicly available datasets, it has been found that biomarkers do not generalize well to data acquired at different sites. This has been attributed to the fact that MRI scanners and data acquisition protocols are different across sites, and this induces an element of non-neural variability in the data that tends to make it difficult for us to discover consistent inter-group FCN differences that are at least partly neural in origin. 
}

Functional brain imaging studies are increasingly utilizing topological data analysis (TDA), a mathematical approach grounded in algebraic topology~\cite{das2022topological}. 


Traditionally, graph-theoretic tools, which can be construed as special cases of the more generic concepts of TDA, have been extensively used to study and quantify FCNs. More recently, advanced tools from TDA such as persistent homology (PH)~\cite{rubinov2010complex} have been used to study complex networks. Persistent homology investigates connections between different parts of networks using algorithms designed to encode and measure the significance of relationships across multiple scales (thresholds). In the context of networks, topological features refer to the $0-$, $1-$, $2-$ dimensional homology groups of a metric space that describe its \emph{connected components}, \emph{tunnels}, and \emph{voids} respectively. Most graph-theoretic techniques can quantify these features of a weighted brain network only at a fixed threshold. PH provides a principled approach to quantifying these features for all thresholds; more precisely, it can track when features (such as connected components, loops, and voids) are created and destroyed with varying scales (threshold). The technique quantifies the individual topological events (birth and death of features) in the graph according to their significance (or persistence). This persistence is represented in the form of barcodes, which encode the threshold at which features appear and disappear. The barcodes encode these sets of features can be seen as a fingerprint for a graph. What makes this fingerprint useful is the presence of metrics such as Wasserstein distance (WD)~\cite{vallender1974calculation, edelsbrunner2013persistent} that can be used to quantify the statistical difference between two barcodes robustly. The WD is robust to small perturbations in the data and hence can be used to compare and establish similarities between persistent diagrams. We use this ability in our pipeline to establish similarities between barcodes obtained from brain networks derived from data acquired with different acquisition parameters (such as sampling period).


The input to our TDA-based pipeline is subject-specific FCNs from three data cohorts, corresponding to data acquired from the same individuals at three different sampling periods (TR):  645ms, 1400ms and 2500ms. The cohort consists of rs-fMRI data from 316 subjects (totaling $3 \times 316$ scans). In our pipeline, %first, each FCN instance is embedded into a metric space that sets it up for TDA. Following this, 
the topological features of each network instance are extracted using persistent homology and encoded with a barcode. The barcodes are compared against each other by using the Wasserstein distance (WD). In particular, we perform two sets of experiments to demonstrate that the FCNs indeed capture the same structure irrespective of the temporal sampling periods: (a) making a direct pairwise comparison of the same subjects across different temporal sampling periods and (b) performing pairwise comparison within a cohort (same temporal sampling period) to extract the overall pattern of subjects and then comparing that pattern across the sampling periods. The statistical analysis is made possible because the barcodes (our metric) can be compared against one another using the WD~\cite{vallender1974calculation, edelsbrunner2013persistent}. 
In our first set of experiments, we perform pairwise WD computation of the same subjects but between different sampling periods (across the data cohorts), and this yields $3$ groups of measurements: WD between ($645$ms and $1400$ms), ($1400$ms and $2500$ms) and ($2500$ms and $645$ms). We then demonstrate similarity among these distributions by performing ANOVA and t-tests, further establishing the invariance that FCNs capture the same information, irrespective of the data acquisition parameters.
In our second set of experiments, we calculate pairwise WD between persistence diagrams of all $316$ subjects. This yields three sets of adjacency matrices (one for every sampling rate); we perform multidimensional scaling (MDS)~\cite{carroll1998multidimensional, cox2008multidimensional} on this high dimensional data, reduce it to a 2D space and then apply clustering techniques to segregate the subjects into clusters. Finally, we compare the number of clusters across the three cohorts. In particular, our paper makes the following three contributions to the literature:
\begin{enumerate}
\item Demonstrate that the barcode obtained from the PH of resting state fMRI data is a compact representation of topological information of an FCN.
\item Present an end-to-end pipeline that uses PH-based techniques to demonstrate that FCN networks are invariant to data acquisition parameters such as temporal sampling periods (TR), potentially removing a source of noise in multi-site case-control studies, thereby improving the effect sizes in group comparisons. 
\item Present an open-source and reproducible codebase for all components of our work. Code including scripts, documentation and data can be found here: \url{https://github.com/harp-lab/brainPH}
\end{enumerate}

The rest of the paper is organized as follows: In Section~\ref{sec:rw} we present relevant related work covering both graph-based and topology-based FCN analysis frameworks. In Section~\ref{sec:methods} we present our end-to-end TDA pipeline comprising four key steps. Finally, in Section~\ref{sec:eval} we present the result of applying our methods to real data. And we then conclude with a discussion in Section~\ref{sec:discussion} and a conclusion in Section~\ref{sec:conclusion}.