Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 1.63 KB

dev_guide.md

File metadata and controls

45 lines (28 loc) · 1.63 KB

JupyterHub Developer Guide

Accessing JupyterHub Environment

For instructions on accessing the KBase JupyterHub environment, please refer to the KBase JupyterHub User Guide.

(If you require write access to MinIO and the database catalog, please contact the KBase CDM Tech team.)

Accessing MinIO

Please refer to the MinIO Guide for instructions on accessing MinIO.

Read/Write MinIO username and password

Get the MinIO username and password with read/write permission from the above JupiterHub environment.

import os
minio_username, minio_password = os.environ['MINIO_ACCESS_KEY'], os.environ['MINIO_SECRET_KEY']

Naming conventions

Please adhere to the following naming conventions for MinIO buckets and objects:

Source Files:

Source files are the raw data files that are uploaded to MinIO.

  • Bucket name: namespace_name-source
  • File name: The file name should either clearly represent the table name or be formatted in a way that allows a program to easily extract the table name from it.

Delta Table Files:

Delta table files are Parquet files generated by Spark during the creation of a table.

  • Bucket name: namespace_name-delta

Loading Notebooks

Please create a corresponding loading notebook for each namespace in the data-loading-notebooks directory.

Please use the existing loading notebooks as examples.

🚨 Please DO NOT rerun the existing loading notebooks in the development environment unless you are certain you want to reload the existing table. Instead, create a new notebook for each new namespace and manually verify the data loading process.