Leveraging FHIR R5 GraphDefinition for Data Traversals and Local Analysis
This project leverages FHIR R5 GraphDefinition objects to define and execute graph-based traversals across multiple interconnected FHIR resource graphs. The data retrieved is written to a local SQLite database for persistence and later transformed into analyst-friendly dataframes for analysis using tools like Python’s pandas library.
FHIR Search provides a robust querying framework but comes with significant limitations:
-
Deep Chaining Limits:
Chaining searches (e.g.,Patient -> Observation -> Encounter -> Procedure
) often hits server depth limitations. -
Inefficient Query Execution:
Searching deeply related resources requires multiple chained requests, leading to performance issues and unnecessary round trips. -
Lack of Explicit Traversals:
Relationships in FHIR are implicit in references (e.g.,Observation.subject
pointing toPatient
). This implicit structure requires manual composition of queries, which is prone to errors.
By using FHIR R5 GraphDefinition, we declaratively define resource relationships and efficiently retrieve data. Once retrieved, the data is stored locally and can be transformed into dataframes for advanced analysis.
- GraphDefinition-Driven Traversals: Use R5 GraphDefinition objects to define explicit relationships between resources and automate traversal logic.
- Local SQLite Storage: Persist the retrieved FHIR data in a local SQLite database for querying and offline analysis.
- Analyst-Friendly Dataframes: Convert stored FHIR resources into pandas dataframes for ease of use in analytical workflows.
- Reusable Graph Definitions: Maintain a library of GraphDefinition YAML files that can be reused across different workflows and projects.
-
GraphDefinition Library
- A collection of reusable GraphDefinition objects in JSON/YAML format. A GraphDefinition defines a traversal path between resources.
- See Example GraphDefinition, FHIR Devdays 2021
-
Traversal Engine
- Reads a GraphDefinition and iteratively queries the FHIR server using RESTful
_include
and_revinclude
operations for efficiency. - Stores the retrieved resources in a SQLite database in JSON format for flexibility.
- Reads a GraphDefinition and iteratively queries the FHIR server using RESTful
-
SQLite Data Storage
- Table Schema: see fhir_query.ResourceDB
-
Analyst-Friendly DataFrames TODO
- Transforms FHIR data from SQLite into pandas dataframes for easier analysis.
- Data can be filtered, aggregated, or visualized to meet analytical use cases.
-
Load a GraphDefinition
- Define a GraphDefinition object (e.g.,
study-to-documents
) to specify the traversal path.
- Define a GraphDefinition object (e.g.,
-
Execute Traversal
- Use the
Traversal Engine
to query the FHIR server based on the GraphDefinition. - Follow each link and include related resources efficiently using
_include
or_revinclude
.
- Use the
-
Store Data Locally
- Write the retrieved resources to the SQLite database with their resource types and full JSON representation.
-
Transform to DataFrames TODO
- Retrieve specific resource types or relationships from the SQLite database.
- Convert the JSON data into structured pandas dataframes for analysis.
To use the fq
command, you need to provide the necessary options. Below is an example of how to use the command:
fq --fhir-base-url <FHIR_BASE_URL> \
--graph-definition-id <GRAPH_DEFINITION_ID> \
--path </Resource?params> \
[--graph-definition-file-path <GRAPH_DEFINITION_FILE_PATH>] \
[--db_path <DB_PATH>] \
[--debug]
# example output
✔ research-study-graph is valid FHIR R5 GraphDefinition
✔ Running research-study-graph traversal
✔ Processing link: Patient/_has:ResearchSubject:subject:study={path}&_revinclude=Group:member&_count=1000&_total=accurate with 1 ResearchStudy(s)
✔ Processing link: Specimen/subject={path}&_revinclude=DocumentReference:subject&_revinclude=Group:member&_count=1000&_total=accurate with 537 Patient(s)
✔ Processing link: Group/member={path}&_count=1000&_total=accurate with 17121 Specimen(s)
✔ Processing link: DocumentReference/subject={path}&_count=1000&_total=accurate with 8169 Group(s)
✔ Processing link: Observation/subject={path}&_count=1000&_total=accurate with 537 Patient(s)
✔ Processing link: Procedure/subject={path}&_include=Procedure:encounter&_count=1000&_total=accurate with 537 Patient(s)
Aggregated Results: {'DocumentReference': 24452, 'Encounter': 20, 'Group': 8169, 'MedicationAdministration': 1074, 'Observation': 23676, 'Patient': 537, 'Procedure': 1616, 'Specimen': 17121}