Skip to content

Latest commit

 

History

History
258 lines (202 loc) · 7.91 KB

README.md

File metadata and controls

258 lines (202 loc) · 7.91 KB

Simba - Your Knowledge Management System

Simba Logo

Connect your knowledge to any RAG system

Simba  - Connect your Knowledge into any RAG based system | Product Hunt

License Stars Forks Issues Pull Requests

Twitter Follow

Simba is an open source, portable KMS (knowledge management system) designed to integrate seamlessly with any Retrieval-Augmented Generation (RAG) system. With a modern UI and modular architecture, Simba allows developers to focus on building advanced AI solutions without worrying about the complexities of knowledge management.

Table of Contents

🚀 Features

  • 🧩 Modular Architecture: Plug in various vector stores, embedding models, chunkers, and parsers.
  • 🖥️ Modern UI: Intuitive user interface to visualize and modify every document chunk.
  • 🔗 Seamless Integration: Easily integrates with any RAG-based system.
  • 👨‍💻 Developer Focus: Simplifies knowledge management so you can concentrate on building core AI functionality.
  • 📦 Open Source & Extensible: Community-driven, with room for custom features and integrations.

🎥 Demo

Watch the demo

🛠️ Getting Started

📋 Prerequisites

Before you begin, ensure you have met the following requirements:

⬇️ Installation

note : Simba uses celery for heavy tasks like parsing. These tasks may be launched with gpu. In order to avoid infrastructure problem related we recommend to launch the app using Docker

💻 Local Development

git clone https://github.com/GitHamza0206/simba.git
cd simba

⚙️ Backend

cd backend
  1. Redis installation make sure to have redis installed in your OS

    #init redis server
     redis-server
  2. setup env

cp .env.example .env

then edit the .env file with your own values

OPENAI_API_KEY="" 
LANGCHAIN_TRACING_V2= #(optional - for langsmith tracing) 
LANGCHAIN_API_KEY="" #(optional - for langsmith tracing) 
REDIS_HOST=redis
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/1
  1. install dependencies
poetry config virtualenvs.in-project true
poetry install
poetry shell OR source .venv/bin/activate (MAC/LINUX) OR .venv\Scripts\activate (WINDOWS)
  1. run backend
python main.py OR uvicorn main:app --reload #for auto reload 

then navigate to http:0.0.0.0:8000/docs to access swagger ui (Optional)

  1. run the parser with celery worker
celery -A tasks.parsing_tasks worker --loglevel=info
  1. modify the config.yaml file to your needs
# config.yaml

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  markdown_dir: "markdown"
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #or ollama (vllm coming soon)
  model_name: "gpt-4o" #or ollama model name
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface" #or openai
  model_name: "BAAI/bge-base-en-v1.5" #or any HF model name
  device: "cpu"  # mps,cuda,cpu
  additional_params: {}

vector_store:
  provider: "faiss" 
  collection_name: "migi_collection"
  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  k: 5 #number of chunks to retrieve 

features:
  enable_parsers: true  # Set to false to disable parsing

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

🖥️ Frontend

Make sure to be in the root simba directory

cd frontend
npm install
npm run dev 

then navigate to http:localhost:5173 to access the frontend

🐳 Launch with docker (Recommended)

navigate to root simba directory

export OPENAI_API_KEY="" #(optional) 
docker-compose up --build 

📂 Project Structure

simba/
├── backend/              # Core processing engine
│   ├── api/              # FastAPI endpoints
│   ├── services/         # Document processing logic
│   ├── tasks/            # Celery task definitions
│   └── models/          # Pydantic data models
├── frontend/             # React-based UI
│   ├── public/           # Static assets
│   └── src/              # React components
├── docker-compose.yml    # Development environment
└── docker-compose.prod.yml # Production setup

⚙️ Configuration

the config.yaml file is used to configure the backend application. You can change :

  • embedding model
  • vector store
  • chunking
  • retrieval
  • features
  • parsers

navigate to backend/README.md for more information

🏁 Roadmap

  • Add more documentation
  • Make simba work with any RAG system as an importable python package
  • Add CI/CD pipeline
  • Add control over chunking parameters
  • Add more parsers
  • Add more vector stores
  • Add more embedding models
  • Add more retrieval models
  • Enable role access control

🤝 Contributing

Contributions are welcome! If you'd like to contribute to Simba, please follow these steps:

  • Fork the repository.
  • Create a new branch for your feature or bug fix.
  • Commit your changes with clear messages.
  • Open a pull request describing your changes.

💬 Support & Contact

For support or inquiries, please open an issue 📌 on GitHub or contact repo owner at Hamza Zerouali