Create Chat Bot Interface Trained On Documentation Site #102

inodb · 2023-04-01T13:58:21Z

Background:

cBioPortal: cBioPortal is an open-source platform for cancer genomics data analysis and visualization. It provides a centralized resource for exploring and analyzing large-scale cancer genomic data sets, including genomic alterations, gene expression, and clinical information. The platform integrates data from multiple sources, including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), and makes it available through a web interface for researchers, clinicians, and the general public. Please refer to the cBioPortal home page for an overview.
cBioPortal has lots of documentation available (https://docs.cbioportal.org/) on how (1) to install and configure cBioPortal locally, (2) use cBioPortal as a user, (3) programmatically use the API. Searching through the documentation is not always straightforward and we often get questions on the user group (https://groups.google.com/g/cbioportal) where we mainly point to a link in the docs. A chat interface might be a good solution for giving users quicker feedback on what they are searching for

Goal:

Build a chat bot interface for cBioPortal's Documentation

Approach:

Train the model on our documentation site (here is an example blog)
Also train the chatbot based on the google group conversations we had
Evaluate different models for this purpose
Integrate a chat interface into the main website (350h project)

Need skills:
Familiarity with the command line and the use of APIs

Possible mentors:
@inodb
@walleXD

Praashh · 2023-04-02T13:43:31Z

Hey @inodb I think my skills are similiar as the project can you assign me ?

priyanshiaroraaa · 2023-04-02T13:44:41Z

I know I am new to this but I am currently building a personal virtual assistant in python language for my minor project 2 in college and I have a good command in Java too. I am an AIML student, the knowledge of which will help me train the model for your chatbot. I have good command in python, Java, Machine learning, NLP and AI algorithms. Since I am currently working on my minor 2 project right now and it is not completed, I am attaching my documentation till now and the code till now for reference
MINOR.docx
synopsis presentation short.pptx
Software Requirements Specification.docx

kamranayesh · 2023-04-03T17:02:57Z

Hi!
I’m Kamran Ayesh, a CSE final student at Indian Institute of Information Technology Guwahati, India. I have written a well explained proposal for chatbot interface trained on documentation site. I am hoping for feedback or any queries from you soon.
I am very well suited for contributing to this project as during my internship I have made a virtual assistant with robust UI. Being a developer this project will enhance my skills and give better exposure to open-source.

Looking forward to contributing!

Thanks,
Kamran Ayesh

Nisarg908 · 2023-04-03T18:32:45Z

I'm interested in helping to build a chatbot, I am Nisarg Patel, a CSE 2nd year university student I would like to contribute in building this chatbot. I am new at this but I am ready to learn and help for the cause and this will help me improve.

Looking forward for your response!

Thanks,
Nisarg Patel

JamesAlaric · 2023-04-04T17:17:09Z

Hello i'm interested in helping to develope this chatbot. How can i apply as gsoc contributor? plzzz

ViditJain123 · 2023-11-18T01:53:25Z

Hey.. is this thing done or not? Igave good experience with making chatbots and also I am good with mern stack, so I can even integrate it with your website

…

On Sat, 18 Nov 2023, 6:50 am j4m3s 4l4r1c, ***@***.***> wrote: Euh... Sorry but what are you talking about? On Fri, Nov 17, 2023, 15:43 Vidit Jain ***@***.***> wrote: > Hey.. is this thing done? If not I can still make it. > > — > Reply to this email directly, view it on GitHub > <#102 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/APVQMSVFPM3FIDWXJTXIFK3YE5Z2JAVCNFSM6AAAAAAWPU2LPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJWGU2TGNBSHA> > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUFRT7MC4JEZ6ZDD4ZLGMR3YFAEPFAVCNFSM6AAAAAAWPU2LPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGMYDONJZGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

NehaAr · 2024-01-20T03:28:49Z

Hi,,,i am working on similar use case for my pipeline..where i am building a chatbot to scrape through the documents in my pipeline..i really would like to solve the above issue

NeuralFlux · 2024-02-21T16:49:30Z

Hi @inodb , I'm a CS grad at NYU with a solid grasp of ML, PyTorch, and CLI. I've worked on LLMs for zero-shot classification on food ingredient data. I believe using LLMs for retrieval augmented generation is highly applicable to your use-case. How would you advise me to get started on this?

kartheekyakkala · 2024-02-24T06:43:33Z

Hello @inodb, I feel we can use Retrieval-Augmented Generation (RAG) technique instead of fine tuning or training. Since the documentation or knowledge base gets updated now and then, fine tuning the LLM could be costly. Moreover, RAG technique is more reliable as it has up to date knowledge. I'm a CS grad at UCM with huge interest in LLMs and Generative AI. I would like to work on this issue could you give me some leads?

Steveolas · 2024-02-28T16:42:05Z

Hey all! I am Ilan, a Data Science grad from the Technion. I would love to contribute to this project.

@inodb As a first step, I wanted to ask if you already thought on how you were going to structure the documentation as data for training. If so, I would love to get am example, If not I think that could be a good step to begin with. Also I would like to know if it's possible to share the documention in some easyto work with format that you might have on the backend. If not, I can just go scraping it straight from the webpage.

Anyway, would love to get some suggestions on what should be the first steps to start getting familiar with the project.

Thanks
Ilan Meissonnier

Steveolas · 2024-02-28T16:55:05Z

BTW The Medium link given as example blog is member only :(. The following blog seems like a pretty similar (hard to tell as couldn't read the original LOL). Hope this is helpful.

Ilan

Steveolas · 2024-03-07T21:23:53Z

Hey all!
I have been thinking about this project a bit and I have some interesting thoughts I'd like to share...

If I was using a chatbot to help me navigate documentation, I would prefer if it would be able to provide me a link to the documentation page where it learned the info from. This way I am able to fact check it and/or read further into the problem I'm having. As we know, LLMs are not always accurate and can sometimes be quite confident even when wrong. While it can be possible to train the chatbot to retrieve a link as well as answer a question (by structuring the training data in such a way), this task might be more simply solved using traditional information retrieval techniques. i.e retrieving the page that best matches a user query from a search bar (I have noticed that the search bar on the documentation webpage is not functional atm). This of course gets more complicated if you want to include answers from the google group conversations, but this approach should definitely be considered. Another option might be trying to combine both approaches together in some way, although we need to decide exactly how to do that.

Would love to hear what everyone thinks about this, or if there might be something I'm missing. Would specifically love to hear your insights on this @inodb.

Sorry for the long post,
Ilan Meissonnier

skhavindev · 2024-03-08T15:41:28Z

Hey!

I am khavin. I am a Artificial Intelligence (AI) student currently pursuing a dual Bachelor of Science in data science at the Indian Institute of Technology Madras (IIT Madras) and Sathyabama Institute of Science and Technology. My have high interest in machine learning ,Artificial intellignce ,Neuromorphic computing

I possess extensive experience working with PyTorch and have successfully built chatbots using Google AI Studio. This has given me some experience on how to train and build chatbots. I think this experience is useful for this application and provide further experience to me on real world applications of AI

Looking forward for open source contributing!

Regards,
Khavin S

Steveolas · 2024-03-16T15:02:49Z

Hey all, I have made a prototype for for a chatbot using RAG. I think RAG could be a pretty good approach for this project. I'm sharing this prototype as a link for a kaggle notebook if you are intrested, be sure to leave any interesting feedback that you may have.

https://www.kaggle.com/code/ilanmeissonnier/rag-for-cbioportal-documentation-chatbot

Ilan Meissonnier

Steveolas · 2024-03-21T19:32:32Z

I have also came across a research paper that came out a few days ago suggesting a method called Research Augmented Fine Tuning (RAFT). I am still not done reading through it but it already seems like it could be a really good approach for this.

Steveolas · 2024-03-21T19:33:56Z

Link to the paper 😅

Mukesh-ghildiyal · 2025-02-15T04:58:11Z

@inodb I am interested on this project and i am working on this.

inodb added cBioPortal Size: Medium (175h) Size: Large (350h) Difficulty: Medium GSoC-2023 GSoC 2023 Candidate Projects labels Apr 1, 2023

inodb added GSoC-2024 GSoC 2024 Candidate Projects and removed GSoC-2023 GSoC 2023 Candidate Projects labels Feb 5, 2024

inodb added GSoC-2025 GSoC 2025 Candidate Projects and removed GSoC-2024 GSoC 2024 Candidate Projects labels Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Chat Bot Interface Trained On Documentation Site #102

Create Chat Bot Interface Trained On Documentation Site #102

inodb commented Apr 1, 2023 •

edited

Loading

Praashh commented Apr 2, 2023

priyanshiaroraaa commented Apr 2, 2023

kamranayesh commented Apr 3, 2023

Nisarg908 commented Apr 3, 2023

JamesAlaric commented Apr 4, 2023 •

edited

Loading

ViditJain123 commented Nov 18, 2023 via email

NehaAr commented Jan 20, 2024

NeuralFlux commented Feb 21, 2024

kartheekyakkala commented Feb 24, 2024

Steveolas commented Feb 28, 2024

Steveolas commented Feb 28, 2024

Steveolas commented Mar 7, 2024 •

edited

Loading

skhavindev commented Mar 8, 2024

Steveolas commented Mar 16, 2024

Steveolas commented Mar 21, 2024

Steveolas commented Mar 21, 2024

Mukesh-ghildiyal commented Feb 15, 2025

Create Chat Bot Interface Trained On Documentation Site #102

Create Chat Bot Interface Trained On Documentation Site #102

Comments

inodb commented Apr 1, 2023 • edited Loading

Praashh commented Apr 2, 2023

priyanshiaroraaa commented Apr 2, 2023

kamranayesh commented Apr 3, 2023

Nisarg908 commented Apr 3, 2023

JamesAlaric commented Apr 4, 2023 • edited Loading

ViditJain123 commented Nov 18, 2023 via email

NehaAr commented Jan 20, 2024

NeuralFlux commented Feb 21, 2024

kartheekyakkala commented Feb 24, 2024

Steveolas commented Feb 28, 2024

Steveolas commented Feb 28, 2024

Steveolas commented Mar 7, 2024 • edited Loading

skhavindev commented Mar 8, 2024

Steveolas commented Mar 16, 2024

Steveolas commented Mar 21, 2024

Steveolas commented Mar 21, 2024

Mukesh-ghildiyal commented Feb 15, 2025

inodb commented Apr 1, 2023 •

edited

Loading

JamesAlaric commented Apr 4, 2023 •

edited

Loading

Steveolas commented Mar 7, 2024 •

edited

Loading