-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
reboot with a simpler setup and using lessons from pyPI work
- Loading branch information
Showing
25 changed files
with
264 additions
and
138 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1 @@ | ||
This directory contains environment-specific configurations for use in pipeline deployment. | ||
|
||
Example to follow... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
export DBT_DATASET=pypi | ||
export DBT_DATASET=bbc_news_example | ||
export DBT_LOCATION=US | ||
export DBT_PROJECT=pypi-408816 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
DBT does not directly manage datasets/schemas and their permissions. | ||
|
||
If you want to manage your dataset ACL as part of the build, | ||
you can provide a JSON document describing the permissions you want as dataset_acl.json | ||
and uncomment the commented-out `bq update` command in the workflow file dataset job. | ||
|
||
See https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_dataset | ||
|
||
```json | ||
{ | ||
"access": [ | ||
|
||
{ | ||
"role": "READER", | ||
"specialGroup": "projectReaders" | ||
}, | ||
{ | ||
"role": "WRITER", | ||
"specialGroup": "projectWriters" | ||
}, | ||
{ | ||
"role": "OWNER", | ||
"specialGroup": "projectOwners" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Terraform is the other obvious option to manage datasets, but this adds complexity and a new toolset/supply chain | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Based on https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions | ||
|
||
Setting up a Workload Identity Federation for GitHub action. | ||
Assumes $DBT_PROJECT is set to the project you want the pool/provider in. | ||
|
||
# Setup WIF in-project | ||
|
||
Unsure whether setting up a WIF pool/provider for each project is the best way, but it seems like the least risky. | ||
|
||
## Gather some info | ||
|
||
```console | ||
export WIF_PROJECT_NUMBER=$(gcloud projects describe "${DBT_PROJECT}" --format="value(projectNumber)") | ||
export WIF_POOL=dbt-pool | ||
export WIF_PROVIDER=dbt-provider | ||
export WIF_GITHUB_REPO=$(git remote get-url origin|cut -d: -f2|cut -d. -f1) | ||
export WIF_SERVICE_ACCOUNT=pypi-vulnerabilities | ||
``` | ||
## Ensure IAM APIs enabled | ||
|
||
```console | ||
gcloud services enable iamcredentials.googleapis.com --project "${DBT_PROJECT}" | ||
``` | ||
|
||
## Setup Service Account | ||
|
||
```console | ||
gcloud iam service-accounts create "${WIF_SERVICE_ACCOUNT}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--description="DBT service account" \ | ||
--display-name="${WIF_SERVICE_ACCOUNT}" | ||
``` | ||
|
||
## Setup Workload Identity Provider | ||
|
||
```console | ||
gcloud iam workload-identity-pools create "${WIF_POOL}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--location="global" \ | ||
--display-name="DBT Pool" | ||
``` | ||
|
||
```console | ||
gcloud iam workload-identity-pools providers create-oidc "${WIF_PROVIDER}" \ | ||
--project="${DBT_PROJECT}" \ | ||
--location="global" \ | ||
--workload-identity-pool="${WIF_POOL}" \ | ||
--display-name="DBT provider" \ | ||
--attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.repository=assertion.repository" \ | ||
--issuer-uri="https://token.actions.githubusercontent.com" | ||
``` | ||
|
||
## Collect up IDs of the Workload Identity Pool and Provider | ||
|
||
```console | ||
export WIF_POOL_PROVIDER_ID=$(gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)") | ||
export WIF_POOL_ID=$(gcloud iam workload-identity-pools describe "${WIF_POOL}" --location=global --project "${DBT_PROJECT}" --format="value(name)") | ||
``` | ||
|
||
## Setup IAM to allow GitHub to assume role | ||
|
||
```console | ||
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \ | ||
--project="${DBT_PROJECT}" \ | ||
--role="roles/iam.workloadIdentityUser" \ | ||
--member="principalSet://iam.googleapis.com/${WIF_POOL_ID}/attribute.repository/${WIF_GITHUB_REPO}" | ||
``` | ||
|
||
```console | ||
gcloud iam service-accounts add-iam-policy-binding "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" \ | ||
--project="${DBT_PROJECT}" \ | ||
--role="roles/iam.serviceAccountTokenCreator" \ | ||
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
||
## Grant Service Account BigQuery admin in the project | ||
|
||
(You may need to make this policy more specific!) | ||
|
||
```console | ||
gcloud projects add-iam-policy-binding "${DBT_PROJECT}" \ | ||
--role="roles/bigquery.admin" \ | ||
--member="serviceAccount:${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
||
## Recover Secrets for GitHub | ||
|
||
Populate secrets for this build as described below | ||
|
||
```console | ||
echo "GitHub Secret: GCP_WORKLOAD_IDENTITY_PROVIDER" | ||
gcloud iam workload-identity-pools providers describe "${WIF_PROVIDER}" --location=global --project "${DBT_PROJECT}" --workload-identity-pool "${WIF_POOL}" --format="value(name)" | ||
``` | ||
|
||
```console | ||
echo "GitHub Secret: GCP_SERVICE_ACCOUNT" | ||
echo "${WIF_SERVICE_ACCOUNT}@${DBT_PROJECT}.iam.gserviceaccount.com" | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,19 @@ | ||
# Python virtualenv files | ||
.venv/ | ||
/.venv/ | ||
|
||
# User's environment settings | ||
.env | ||
/.env | ||
|
||
# DBT logs | ||
logs/ | ||
/logs/ | ||
|
||
# DBT target dir | ||
target/ | ||
/target/ | ||
|
||
# DBT packages | ||
dbt_packages/ | ||
package-lock.yml | ||
/dbt_packages/ | ||
/package-lock.yml | ||
|
||
# files that we don't want committed | ||
/uncommitted/* | ||
!/uncommitted/README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: ensure_target_dataset_exists | ||
description: Creates the specified dataset if it does not exist and the executor has permission |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{% macro ensure_udfs() %} | ||
-- See https://www.equalexperts.com/blog/our-thinking/testing-and-deploying-udfs-with-dbt | ||
CREATE OR REPLACE FUNCTION {{ target.schema }}.shout(say STRING) | ||
RETURNS STRING | ||
OPTIONS (description='Shouts the say string. NULL when argument is NULL') | ||
AS ( | ||
UPPER(say) || '!' | ||
); | ||
|
||
{% endmacro %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: ensure_udfs | ||
description: Creates UDFs specified in the macro. Does not clean up any UDFs that are removed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
SELECT | ||
category, | ||
COUNT(1) article_count | ||
FROM {{ source('bbc_news', 'fulltext') }} | ||
GROUP BY category |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
version: 2 | ||
|
||
models: | ||
- name: categories | ||
description: News article categories and counts | ||
columns: | ||
- name: category | ||
description: Category name | ||
tests: | ||
- dbt_utils.at_least_one | ||
- unique | ||
- not_null | ||
- name: article_count | ||
description: Number of articles in category | ||
tests: | ||
- not_null | ||
- dbt_utils.accepted_range: | ||
min_value: 0 | ||
|
||
|
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.