Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Setup Instructions #1

Open
crkarthik11 opened this issue May 22, 2024 · 11 comments
Open

Request for Setup Instructions #1

crkarthik11 opened this issue May 22, 2024 · 11 comments

Comments

@crkarthik11
Copy link
Contributor

Hello

I came across the Geniusrise framework repo and found it very interesting, so I was trying to set up this demo project and saw some of the model files were missing, I was wondering if the below files are publicly available.
`

networkx_graph="./saved/snomed.graph" \
faiss_index="./saved/faiss.index.Bio_ClinicalBERT" \
concept_id_to_concept="./saved/concept_id_to_concept.pickle" \
description_id_to_concept="./saved/description_id_to_concept.pickle" \

Thanks

@ixaxaar
Copy link
Member

ixaxaar commented May 30, 2024

Hey those are very large files and cannot be uploaded into github, also you need to sign a license etc with NIH so I cannot distribute them. (healthcare data, has various restrictions, but not that hard to obtain - sign up, apply, takes max a week for approval)

check this out for links: https://github.com/geniusrise/awesome-healthcare-datasets

I can help you generate them using the dataset once you get your hands on the raw data!

@crkarthik11
Copy link
Contributor Author

Hi @ixaxaar ,

Thanks for the response. I have access to UMLS, Snomed CT International Version, and MIMIC 3. Would I need any other datasets?

I tried following this link docs but I think it's still a work in progress. I would be happy to document the steps I took to set up and possibly raise a pull request to complete the steps in docs.

Thanks.

@ixaxaar
Copy link
Member

ixaxaar commented May 30, 2024

Awesome!

look at tests/test_load.py

all you have to do is execute the unit test - test_load_snomed_into_networkx and it will create everything, you can also choose what to create etc, there are 2 more tests there e.g. test_load_snomed_into_networkx_no_index or comment out parts you do not want to generate.


Oh yep, this was a demo which I maintained for some time before deciding to pivot on building geniusrise as a platform instead.

I'm recently still working on it btw, have been integrating:

  1. Disease ontology
  2. Gene Ontology
  3. LOINC
  4. MESH
  5. RXNORM
  6. UMLS

If you'd like, give me maybe a week and I'll have all of the above in and linked together into one large graph.

@ixaxaar
Copy link
Member

ixaxaar commented May 30, 2024

The doc btw is here -> https://github.com/geniusrise/docs/blob/master/docs/guides/dev_cycle.md

This project is too big for me to maintain alone, contributions are always super welcome!

@crkarthik11
Copy link
Contributor Author

Sounds great, I will try to set up tests/test_load.py and get back with some updates.

@crkarthik11
Copy link
Contributor Author

Hi

I see there are some new updates to the repo, but I cannot find the test files in the latest update.

@ixaxaar
Copy link
Member

ixaxaar commented Jun 21, 2024

Hey I managed to load a bunch of graphs but still joining them. Each graph has its own nuance etc so taking time.
I was using the base file to load each graph to test, like python ./geniusrise_healthcare/knowledge_graphs/base.py.

Also the scripts to download data is also there in scripts/.

@crkarthik11
Copy link
Contributor Author

Hey,
I am able to load snomedCT concepts into the graph using the scripts, I have also raised a pull request with some minor fixes. I observed that some of the APIs have also been removed i will wait for new commits. Is there any help required in testing please let me know.

@ixaxaar
Copy link
Member

ixaxaar commented Jul 3, 2024

Hey thanks man! I've created this PR to track what I'm currently up to -> #3
I'm adding APIs to these KGs in 3 ways:

  1. graph apis
  2. lucene-based reverse index searches of these graphs
  3. faiss based semantic searches of these graphs

after this I'm gonna proceed to integrating various biomedical sources - e.g. papers and other shit (bioarxiv, medarxiv, NCBI APIs etc)

I've been also thinking of integrating gget - that way a lot of *-omics databases can also be integrated

finally it will be time to build agents using these medical sources and openai compatible llms hosted wherever

Let me know if you want to take something up and I can like give you some structured specs etc

@crkarthik11
Copy link
Contributor Author

Hi,

It looks like a solid set of features that you have planned, let me know if I can help with any small tasks or modules to begin with.

@ixaxaar
Copy link
Member

ixaxaar commented Jul 13, 2024

hey, was very busy last week :/
if you're just beginning, it would be great if you could load all the other KGs and give the whole thing a run

once you're comfortable, you could then pick something which you could work on independently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants