ARCH: Knowledge Graph Creation from Clinical Features

Overview

ARCH is a package designed for constructing a large-scale knowledge graph (KG) based on clinical feature co-occurrence matrices. The steps include data preprocessing, calculating co-occurrence matrices, variance estimation, and knowledge graph construction. The package applies methods introduced in the paper:

Gan, Ziming, et al. "Arch: Large-scale knowledge graph via aggregated narrative codified health records analysis." medRxiv (2023)..

Prerequisites

Python 3.x
R (version X.X.X or later)
Required Python libraries:
- numpy
- scipy
- pandas
Required R libraries:
- Matrix
- RSpectra

Workflow

The process of constructing the knowledge graph involves several steps:

Data Preprocessing
Use the pre.py script to preprocess the raw dataset. This step will extract essential information such as:
- Total number of patients
- Average number of features recorded per patient per day
Command to run:
```
python pre.py
```
Co-occurrence Matrix Calculation
After preprocessing, use the pre.R script to calculate the co-occurrence matrix. This includes:
- Calculating the SVD-SPPMI (Singular Value Decomposition of Shifted Positive Pointwise Mutual Information) matrix
- Estimating parameters needed to calculate the variance of the SVD-SPPMI matrix
Command to run in R:
```
Rscript pre.R
```
Variance Estimation
To estimate the variance of the calculated SVD-SPPMI matrix, use the cosine_test.py script:
```
python cosine_test.py
```
Knowledge Graph Construction
Once the variance has been estimated, use the construct_KG.R script to construct the knowledge graph based on the processed data:
```
Rscript construct_KG.R
```

Usage

Below is an example of how to use the ARCH package to create a knowledge graph (KG) from electronic health record (EHR) data. This example assumes that you have already preprocessed your data and calculated the necessary co-occurrence matrices.

Example: Constructing the Knowledge Graph

Step 1: Load Required Libraries and Data

Ensure you have all necessary libraries loaded and data files ready before running the script.

# Load required packages
library(Matrix)
library(RSpectra)

# Load your preprocessed data (replace 'data_path' with the actual path to your data)
cooccurrence_matrix <- readRDS('data_path/cooccurrence_matrix.rds')
svd_sppmi_matrix <- readRDS('data_path/svd_sppmi_matrix.rds')

Step 2: Perform Matrix Factorization

Use Singular Value Decomposition (SVD) to decompose the co-occurrence matrix. This will help reduce the dimensionality of the data and prepare it for knowledge graph construction.

# Perform SVD on the co-occurrence matrix
svd_result <- svds(cooccurrence_matrix, k = 50)  # k is the number of dimensions

# Extract the U, D, and V matrices from the SVD result
U <- svd_result$u
D <- diag(svd_result$d)
V <- svd_result$v

Step 3: Construct the Knowledge Graph

Now, you can construct the knowledge graph by using the SVD results. The graph can be represented as a set of relationships between clinical features based on their similarity in the lower-dimensional space.

# Construct a similarity matrix based on the SVD results
similarity_matrix <- U %*% D %*% t(V)

# Set a threshold for determining strong relationships (adjust this based on your analysis)
threshold <- 0.8
knowledge_graph <- similarity_matrix > threshold

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DownstreamTasks		DownstreamTasks
KG		KG
img		img
.DS_Store		.DS_Store
.gitattributes		.gitattributes
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARCH: Knowledge Graph Creation from Clinical Features

Overview

Prerequisites

Workflow

Usage

Example: Constructing the Knowledge Graph

Step 1: Load Required Libraries and Data

Step 2: Perform Matrix Factorization

Step 3: Construct the Knowledge Graph

About

Releases

Packages

Languages

junwei-lu/ARCH

Folders and files

Latest commit

History

Repository files navigation

ARCH: Knowledge Graph Creation from Clinical Features

Overview

Prerequisites

Workflow

Usage

Example: Constructing the Knowledge Graph

Step 1: Load Required Libraries and Data

Step 2: Perform Matrix Factorization

Step 3: Construct the Knowledge Graph

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages