Modify notebooks to retrieve data from operate first public bucket #47

mayaCostantini · 2021-11-16T13:48:57Z

Related Issues and Dependencies

Related to #31

This introduces a breaking change

Yes
No

This Pull Request implements

The notebooks used for thoth datasets analysis now allow users to access the datasets on an operate first public bucket instead of downloading them locally.

Learn how to access notebooks in JupyterHub available through Open Data Hub on Operate First spawning Experimental Elyra image and learn about Thoth datasets.
Verify/Request the credentials of the public bucket available on Operate First.
Place datasets on that bucket under thoth/datasets/{dataset-name} using aws/s3 CLI from your local terminal.
Modify notebooks to retrieve data from s3 instead of using local datasets.
Push changes to thoth-station/datasets using Git extension.
Create a release using Kebechet and AICoE-CI
Add image to JupyterHub list of images, so that users can learn about thoth datasets.
Demo

Description

The changes principally consist in adding environment variables to the notebooks that contain the necessary credentials to access the datasets on the public bucket and in modifying the code to retrieve the datasets from the correct prefixes.

review-notebook-app · 2021-11-16T13:49:01Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

mayaCostantini · 2021-11-16T13:50:16Z

/assign @pacospace

pacospace

lgtm! very nice work!! 💯

sesheta · 2021-11-17T13:23:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pacospace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pacospace]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Modify notebooks to retrieve data from operate first public bucket

9065a1c

sesheta added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 16, 2021

sesheta requested review from harshad16 and KPostOffice November 16, 2021 13:49

sesheta assigned pacospace Nov 16, 2021

pacospace approved these changes Nov 17, 2021

View reviewed changes

sesheta added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2021

sesheta merged commit b838bbf into thoth-station:master Nov 17, 2021

mayaCostantini deleted the add-datasets-image-operatefirst branch November 17, 2021 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify notebooks to retrieve data from operate first public bucket #47

Modify notebooks to retrieve data from operate first public bucket #47

mayaCostantini commented Nov 16, 2021 •

edited

Loading

review-notebook-app bot commented Nov 16, 2021

mayaCostantini commented Nov 16, 2021

pacospace left a comment

sesheta commented Nov 17, 2021

Modify notebooks to retrieve data from operate first public bucket #47

Modify notebooks to retrieve data from operate first public bucket #47

Conversation

mayaCostantini commented Nov 16, 2021 • edited Loading

Related Issues and Dependencies

This introduces a breaking change

This Pull Request implements

Description

review-notebook-app bot commented Nov 16, 2021

mayaCostantini commented Nov 16, 2021

pacospace left a comment

Choose a reason for hiding this comment

sesheta commented Nov 17, 2021

mayaCostantini commented Nov 16, 2021 •

edited

Loading