For instructions on accessing the KBase JupyterHub environment, please refer to the KBase JupyterHub User Guide.
(If you require write
access to MinIO and the database catalog, please contact the KBase CDM Tech team.)
Please refer to the MinIO Guide for instructions on accessing MinIO.
Get the MinIO username and password with read/write permission from the above JupiterHub environment.
import os
minio_username, minio_password = os.environ['MINIO_ACCESS_KEY'], os.environ['MINIO_SECRET_KEY']
Please adhere to the following naming conventions for MinIO buckets and objects:
Source files are the raw data files that are uploaded to MinIO.
- Bucket name:
namespace_name
-source - File name: The file name should either clearly represent the table name or be formatted in a way that allows a program to easily extract the table name from it.
Delta table files are Parquet files generated by Spark during the creation of a table.
- Bucket name:
namespace_name
-delta
Please create a corresponding loading notebook for each namespace in the data-loading-notebooks
directory.
Please use the existing loading notebooks as examples.
🚨 Please DO NOT rerun the existing loading notebooks in the development environment unless you are certain you want to reload the existing table. Instead, create a new notebook for each new namespace and manually verify the data loading process.