Design

9dt take home project

Design

Data Pipeline and load tables

Assumptions

The extract-load-transform pattern is preferred over extract-transform-load.
The tables in this section are used as a source for data warehouse loads but are not exposed to end users.
All attributes in the games data is exposed to data warehouse end users.
Only required attributes about users/players is exposed to data warehouse end users.

Implementation details

The player data from the users API is saved to S3 in the original JSON format. Each page of results contains 10 JSON documents, and each document is split into a new line before saving to S3.
The Athena table "dev9dt_athena.players" maps the JSON keys to table columns
The games data was manually copied from the http location to another S3 location accessable from Athena
The Athena table "dev9dt_athena.games" exposes the csv data to table columns
The Redshift external schema "athena_schema" is implemented from the Athena database "dev9dt_athena." This gives Redshift access to the Athena table definitions.

Data Warehouse Tables and Views

The Redshift Local schema dev_9dt implements the data warehouse tables and views.
Table dev_9dt.player_d contains the player attributes needed for the analyis. It acts as a dimension table. It does not include PII or protected information.
Table dev_9dt.games_f contains the measures at the grain of game_id. It is a flattened representation of the games.csv data. Moves data is preserved in games_f.moves as an ordered list.
Table dev_9dt.player_metric_games_played contains the number of games played by each player. It is used for two analysis views, games by nationaility and players who played a single game only.

Analysis

View dev_9dt.games_metric_first_move_percentile_rank_v contains the 1st analysis results for the percentile rank of the first move in a game_id
View dev_9dt.games_metric_played_by_nationality_v contains the 2nd analysis results for the games played by nationality
View dev_9dt.player_metric_played_single_game_v contains the 3rd analysis results for the players who played only one game, with their win/lose/draw status and email address.

Usage Notes

The Redsfhit analysis views and the local base tables are exposed through the redshift connection details here:

redshift_jdbc_endpoint = "redshift-cluster-1.cpykyfqusghx.us-west-1.redshift.amazonaws.com"
redshift_jdbc_port = "5439"
redshift_jdbc_database = "dev9dt"
redshift_jdbc_username = "ninetyeightpoint6"
redshift_jdbc_password = "{shared separately}"

The redshift user has the following access:

grant all on all tables in schema athena_schema to ninetyeightpoint6;
grant all on all tables in schema dev_9dt to ninetyeightpoint6;

grant usage on schema athena_schema to ninetyeightpoint6;
grant usage on schema dev_9dt to ninetyeightpoint6;

The main.py file writes the users API data to S3. The S3 bucket can be updated in the variable "bucket_name."
The Athena and Redshift setup scripts are in the file setup.py. These are one-time scripts to create the data wareshoue objects. Incremental loads are not in scope.
Athena is only used as a pass-through to enable Redshift Spectrum access to the players and games data in S3. Athena access is not included.
The IAM users and roles were seutp manually and are not included in this package.
The AWS account used to create these objects is a personal account.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
__init__.py		__init__.py
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

9dt take home project

Design

Data Pipeline and load tables

Assumptions

Implementation details

Data Warehouse Tables and Views

Analysis

Usage Notes

About

Releases

Packages

Languages

jayljohnson/9dt

Folders and files

Latest commit

History

Repository files navigation

9dt take home project

Design

Data Pipeline and load tables

Assumptions

Implementation details

Data Warehouse Tables and Views

Analysis

Usage Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages