Skip to content

Thinking Dataset v0.0.1: A Foundation for Strategic AI-Driven Business Intelligence with STaR Case Study Generation

Latest
Compare
Choose a tag to compare
@p3nGu1nZz p3nGu1nZz released this 26 Jan 15:00
· 75 commits to main since this release

OIG 4CuuligKRCqhaUoOtA

A Framework for Strategic Business Insights

Release v0.0.1 - Initial Release

Overview

The Thinking Dataset Project v0.0.1 introduces the foundational framework for generating strategic business insights and STaR (Situation, Task, Action, Result) case studies. This initial release sets up the basic infrastructure for analyzing complex strategic scenarios, ethical dilemmas, and decision-making processes.

Key Features

  • Basic Pipeline Infrastructure: Initial implementation of data ingestion and preprocessing pipelines
  • SQLite Database Setup: Basic data storage and management system
  • CLI Tool: Essential command-line interface for basic dataset operations
  • Initial Adapters: Basic support for Hugging Face and Ollama endpoints
  • Core Documentation: Basic documentation covering installation and usage

Components

Core Features

  • Data ingestion functionality
  • Preprocessing pipeline
  • Foundational case study format
  • Model evaluation framework
  • Adapter implementations

Technical Implementation

  • Python 3.12+ support
  • SQLite and SQLAlchemy integration
  • Pipeline configuration
  • Automatic environment setup
  • Robust logging

Installation

Prerequisites

  • Python 3.10 or later
  • Git
  • A cloud-based account (e.g., OpenAI) or a GPU (RTX 3090 or greater) for processing, or both

Setup

  1. Clone the repository:

    git clone https://github.com/MultiTonic/thinking-dataset.git
    cd thinking-dataset
  2. Install uv package manager:

    First add the package into the global environment:

    pip install uv

    Then add uv tools directory to PATH*:

    uv tool update-shell
  3. Set up the project:

    uv run setup

    *You may need to restart your terminal session for the changes to update.

This will create a virtual environment, install the project dependencies, and activate the virtual environment.

  1. Set up environment variables:

    Copy the .env.sample file to .env and change the values as needed:

    cp .env.sample .env

    Update the .env file with your credentials:

    # Required settings
    HF_ORG="my_huggingface_organization"
    HF_USER="my_huggingface_username"
    HF_READ_TOKEN="my_huggingface_read_access_token"
    HF_WRITE_TOKEN="my_huggingface_write_access_token"
    
    # Required configuration
    CONFIG_PATH="config/config.yaml"
    
    # One or more providers
    OLLAMA_SERVER_URL="http://localhost:11434"
    OPENAI_API_TOKEN="your_openai_api_token"
    RUNPOD_API_TOKEN="your_runpod_api_token"

Breaking Changes

None (Initial Release)

Bug Fixes

  • Initial release, no bug fixes to report

Dependencies

  • Python 3.10+
  • SQLite
  • pandas
  • scikit-learn
  • rich
  • python-dotenv
  • Hugging Face Transformers
  • Ollama

Setup

  1. Clone the repository:

    git clone https://github.com/MultiTonic/thinking-dataset.git
    cd thinking-dataset
  2. Install uv package manager:

    First add the package into the global environment:

    pip install uv

    Then add uv tools directory to PATH*:

    uv tool update-shell
  3. Set up the project:

    uv run setup

    *You may need to restart your terminal session for the changes to update.

This will create a virtual environment, install the project dependencies, and activate the virtual environment.

  1. Set up environment variables:

    Copy the .env.sample file to .env and change the values as needed:

    cp .env.sample .env

    Update the .env file with your credentials:

    # Required settings
    HF_ORG="my_huggingface_organization"
    HF_USER="my_huggingface_username"
    HF_READ_TOKEN="my_huggingface_read_access_token"
    HF_WRITE_TOKEN="my_huggingface_write_access_token"
    
    # Required configuration
    CONFIG_PATH="config/config.yaml"
    
    # One or more providers
    OLLAMA_SERVER_URL="http://localhost:11434"
    OPENAI_API_TOKEN="your_openai_api_token"
    RUNPOD_API_TOKEN="your_runpod_api_token"
  • Runpod
  • Additional dependencies listed in thinking-dataset.toml

Security Updates

  • Initial security configurations implemented
  • Basic authentication and authorization flows established

Documentation Updates

  • Added initial project documentation
  • Included installation guide
  • Basic usage instructions
  • Architecture overview

Known Issues

  1. Limited to text-based data processing in this release
  2. GPU support requires RTX 3090 or greater
  3. Some advanced features planned for future releases
  4. Can only think, reasoning coming in v0.02!

Upgrade Instructions

Initial release - no upgrade needed.

Contributors

Special thanks to our initial contributors:

  • Kara Rawson (Lead Engineer)
  • Joseph Pollack (Creator & Business Leader)
  • MultiTonic Team

Support

For support and questions:

Version

  • Release: v0.0.1
  • Date: 2024-01-25
  • Commit: 0716d8d (Initial commit)

What's Changed

New Contributors

Full Changelog: https://github.com/MultiTonic/thinking-dataset/commits/v0.0.1