Data Preparation

Steps

Download the history archive data of S&P 500 stocks from Kaggle to the data/sp500 folder:

cd data/sp500
wget https://www.kaggle.com/datasets/andrewmvd/sp-500-stocks/versions/337

Prepare a TiDB serverless cluster.

You can follow the TiDB Cloud Quick Start to create a TiDB serverless cluster.

Run script to create table schema defined in fixtures/schema.sql:

export DATABASE_URL=mysql://<username>:<password>@gateway01.us-west-2.prod.aws.tidbcloud.com:4000/sp500insight?timezone=Z
pnpm run cli:create-schema

Run script to ETL data from CSV files to database:
```
pnpm run cli:csv-to-db
```

Using dumpling to dump data from TiDB and upload to AWS S3.

tiup dumpling --host gateway01.us-west-2.prod.aws.tidbcloud.com --port 4000 --user <username> --password <password> --filetype sql --filter 'sp500insight.*' -o "s3://tidb-cloud-demos/sp500insight"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA_PREPARATION.md

DATA_PREPARATION.md

Data Preparation

Steps

Files

DATA_PREPARATION.md

Latest commit

History

DATA_PREPARATION.md

File metadata and controls

Data Preparation

Steps