Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 1.05 KB

DATA_PREPARATION.md

File metadata and controls

34 lines (22 loc) · 1.05 KB

Data Preparation

Steps

  1. Download the history archive data of S&P 500 stocks from Kaggle to the data/sp500 folder:

    cd data/sp500
    wget https://www.kaggle.com/datasets/andrewmvd/sp-500-stocks/versions/337
  2. Prepare a TiDB serverless cluster.

    You can follow the TiDB Cloud Quick Start to create a TiDB serverless cluster.

  3. Run script to create table schema defined in fixtures/schema.sql:

    export DATABASE_URL=mysql://<username>:<password>@gateway01.us-west-2.prod.aws.tidbcloud.com:4000/sp500insight?timezone=Z
    pnpm run cli:create-schema
  4. Run script to ETL data from CSV files to database:

    pnpm run cli:csv-to-db
  5. Using dumpling to dump data from TiDB and upload to AWS S3.

    tiup dumpling --host gateway01.us-west-2.prod.aws.tidbcloud.com --port 4000 --user <username> --password <password> --filetype sql --filter 'sp500insight.*' -o "s3://tidb-cloud-demos/sp500insight"