Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating DBRetina with an LLM-Powered Chatbot for Genomic Data Exploration #250

Open
MoHelmy opened this issue Jan 25, 2025 · 6 comments

Comments

@MoHelmy
Copy link

MoHelmy commented Jan 25, 2025

Introduction to DBRetina

DBRetina is a high-performance bioinformatics tool with an efficient linear algorithm for calculating the pairwise distance among large collections of gene sets. This algorithm enables easy construction of a comprehensive pairwise molecular similarity network within and across several molecular databases. To enable efficient search and visualization of this huge similarity network, DBRetina can transform the final output into a format compatible with the Neo4j graph databases.

Challenge:

While DBRetina bridges genomic analytics and graph databases, querying Neo4j requires Cypher query language expertise, limiting accessibility for non-technical researchers.

Goal and Aims

To develop an LLM-Driven chatbot that translates natural language questions into Cypher queries, enabling intuitive interaction with DBRetina-generated Neo4j graphs.
This chatbot aims to:
Increase accessibility: Enables non-technical users to query complex genomic networks.
Improve efficiency: Reduces query-writing time by ~70% (based on LLM benchmarks ).
Scalability: Adapts to evolving graph schemas and supports multi-database integration.

Difficulty Level: Medium/Hard

Size and Length of Project

  • medium: 175 hours
  • 12 -16 weeks

Skills

Essential skills: LLM fine tuning, Experience with Graph databases, HTML, CSS, JS
Nice to have skills: C++

Public Repository

DBRetina Documentation
Neo4j Cypher Manual
LLM Fine-Tuning for KBQA

Potential Mentors

Mohamed Helmy
Tamer Mansour

@harshagr70
Copy link

harshagr70 commented Jan 27, 2025

Hi @MoHelmy , I was going through the projects listed by you, and while I believe I have the skills to contribute to either of them, this particular project excites me the most. Having previously worked on projects involving fine-tuning LLMs and exploring graph-based tools like Cytoscape, I feel this aligns well with my interests and experience.

Additionally, I am currently brushing up on my skills through active contributions to open-source projects, and also contributing actively in machine learning domain . I am very keen to contribute to this project and am confident in my ability to learn and deliver effectively.

Looking forward to hearing your thoughts.

@Jitmandal051004
Copy link

Hi @MoHelmy , I’d like to contribute to this issue. I have experience in building full-stack websites and have previously worked on developing RAG pipeline for local LLM as part of my robotics project. Looking forward to your response

@MoHelmy
Copy link
Author

MoHelmy commented Jan 31, 2025

@harshagr70 Please write to us on the provided email addresses.

@MoHelmy
Copy link
Author

MoHelmy commented Jan 31, 2025

@Jitmandal051004 Please write to us on the provided email addresses.

@Jitmandal051004
Copy link

Hi @MoHelmy , Just following up on my email from February 3rd. Could you please take a look and reply when you get a chance?

@MoHelmy
Copy link
Author

MoHelmy commented Feb 11, 2025

@Jitmandal051004 received and will write to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants