-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(python): adding example for s3, athena and glue (#988)
* feat(java/EKS): Adding EKS Fargate sample * feat(python): adding example for s3, athena and glue * Delete python/athena-s3-glue/source.bat * chore: added comments to the code --------- Co-authored-by: Paulo Pereira <[email protected]> Co-authored-by: Michael Kaiser <[email protected]>
- Loading branch information
1 parent
a889d32
commit dd77351
Showing
17 changed files
with
493 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
<!--BEGIN STABILITY BANNER--> | ||
--- | ||
|
||
![Stability: Stable](https://img.shields.io/badge/stability-Stable-success.svg?style=for-the-badge) | ||
|
||
> **This is a stable example. It should successfully build out of the box** | ||
> | ||
> This example is built on Construct Libraries marked "Stable" and does not have any infrastructure prerequisites to | ||
> build. | ||
--- | ||
<!--END STABILITY BANNER--> | ||
|
||
# Auditing logs with _S3_, _Athena_ and _Glue_ | ||
|
||
This is an example of a CDK program written in Python.\ | ||
**Use Case**: a customer wants to store and be able to audit their user logs using common SQL statements. | ||
|
||
## Solution Description | ||
|
||
To provide the log storage we will deploy an _Amazon S3_ bucket and the auditing capability will be provided by _Amazon | ||
Athena_. | ||
|
||
_Athena_ will use the _S3_ bucket as the source for queries that will return specific values given the audit process. | ||
|
||
In addition, we will deploy **seven log samples** on the bucket organized by business domain and date to grant _Athena_ | ||
high performance and cost efficiency during the queries. An _AWS Glue_ crawler will create the Data Catalog used by | ||
_Athena_, and **three named queries** will be available for testing. | ||
|
||
## CDK Toolkit | ||
|
||
The `cdk.json` file tells the CDK Toolkit how to execute your app. | ||
|
||
This project is set up like a standard Python project. The initialization | ||
process also creates a virtualenv within this project, stored under the `.venv` | ||
directory. To create the virtualenv it assumes that there is a `python3` | ||
(or `python` for Windows) executable in your path with access to the `venv` | ||
package. If for any reason the automatic creation of the virtualenv fails, | ||
you can create the virtualenv manually. | ||
|
||
To manually create a virtualenv on MacOS and Linux: | ||
|
||
``` | ||
$ python3 -m venv .venv | ||
``` | ||
|
||
After the init process completes and the virtualenv is created, you can use the following | ||
step to activate your virtualenv. | ||
|
||
``` | ||
$ source .venv/bin/activate | ||
``` | ||
|
||
If you are a Windows platform, you would activate the virtualenv like this: | ||
|
||
``` | ||
% .venv\Scripts\activate.bat | ||
``` | ||
|
||
Once the virtualenv is activated, you can install the required dependencies. | ||
|
||
``` | ||
$ pip install -r requirements.txt | ||
``` | ||
|
||
At this point you can now synthesize the CloudFormation template for this code. | ||
|
||
``` | ||
$ cdk synth | ||
``` | ||
|
||
To add additional dependencies, for example other CDK libraries, just add | ||
them to your `setup.py` file and rerun the `pip install -r requirements.txt` | ||
command. | ||
|
||
|
||
## Deploying the solution | ||
|
||
To deploy the solution, we will need to request cdk to deploy the stack: | ||
|
||
```shell | ||
$ cdk deploy --all | ||
``` | ||
|
||
Now that we have the infrastructure created, you will need to populate the Glue Database. Do that by going to the AWS | ||
console, _AWS Glue_, _Data Catalog_, _Crawlers_. | ||
|
||
Select `logs-crawler` and hit the button *Run*. When it finishes, you will be ready to test the solution. | ||
|
||
|
||
## Testing the solution | ||
|
||
1. Head to _AWS_ console and then to _Amazon Athena_ | ||
2. On the left panel, go to **Query editor** | ||
3. Change the **Workgroup** selection to `log-auditing` | ||
4. On **Data source**, choose `AwsDataCatalog` | ||
5. On **Database**, choose `log-database` | ||
6. Two tables will be displayed on the **Tables** section. Expand both and their fields will be displayed | ||
7. You can now start writing your queries on the right panel and then clicking **Run** to perform the query against the | ||
database. | ||
8. Optionally you can go to the **Saved queries** and select one to open on the **Editor** panel, helping you format the | ||
query. | ||
|
||
> **Tip**: you can explore the `auditing-logs` bucket and check all the log files inside it. If you want to add other | ||
> logs to perform more complex tests, follow the directory structure and if needed to add another directory, make sure | ||
> you run the respective _Glue Crawler_ in order to update the partitions. | ||
|
||
## Destroying the deployment | ||
|
||
To destroy the provisioned infrastructure, you can simply run the following command: | ||
|
||
```shell | ||
$ cdk destroy --all | ||
``` | ||
|
||
## Running Unit Tests | ||
To invoke Unit Tests (from the root project folder) | ||
``` | ||
pytest | ||
``` | ||
|
||
If you want to invoke a specific unit test file, just pass the filename as a parameter. (wildcards also work, e.g. `pytest tests/unit/*_stack*`). | ||
``` | ||
pytest tests/unit/<test_filename> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import aws_cdk as cdk | ||
|
||
from athena_s3_glue.athena_s3_glue_stack import AthenaS3GlueStack | ||
|
||
app = cdk.App() | ||
|
||
demo_stack = AthenaS3GlueStack(app, "DemoAthenaS3GlueStack") | ||
|
||
cdk.Tags.of(demo_stack).add(key='project', value='demo-athena-s3-glue') | ||
|
||
app.synth() |
Empty file.
110 changes: 110 additions & 0 deletions
110
python/athena-s3-glue/athena_s3_glue/athena_s3_glue_stack.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
from aws_cdk import ( | ||
Stack, | ||
RemovalPolicy, | ||
aws_s3 as s3, | ||
aws_s3_deployment as s3_deployment, | ||
aws_glue as glue, | ||
aws_iam as iam, | ||
aws_athena as athena | ||
) | ||
from constructs import Construct | ||
|
||
|
||
class AthenaS3GlueStack(Stack): | ||
|
||
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: | ||
super().__init__(scope, construct_id, **kwargs) | ||
|
||
# creating the buckets where the logs will be placed | ||
logs_bucket = s3.Bucket(self, 'logs-bucket', | ||
bucket_name=f"auditing-logs-{self.account}", | ||
removal_policy=RemovalPolicy.DESTROY, | ||
auto_delete_objects=True | ||
) | ||
|
||
# creating the bucket where the Athena queries output will be placed | ||
query_output_bucket = s3.Bucket(self, 'query-output-bucket', | ||
bucket_name=f"auditing-analysis-output-{self.account}", | ||
removal_policy=RemovalPolicy.DESTROY, | ||
auto_delete_objects=True | ||
) | ||
|
||
# uploading the log files to the bucket as examples | ||
s3_deployment.BucketDeployment(self, 'sample-files', | ||
destination_bucket=logs_bucket, | ||
sources=[s3_deployment.Source.asset('./log-samples')], | ||
content_type='application/json', | ||
retain_on_delete=False | ||
) | ||
|
||
# creating the Glue Database to serve as our Data Catalog | ||
glue_database = glue.CfnDatabase(self, 'log-database', | ||
catalog_id=self.account, | ||
database_input=glue.CfnDatabase.DatabaseInputProperty( | ||
name="log-database" | ||
)) | ||
|
||
# creating the permissions for the crawler to enrich our Data Catalog | ||
glue_crawler_role = iam.Role(self, 'glue-crawler-role', | ||
role_name='glue-crawler-role', | ||
assumed_by=iam.ServicePrincipal(service='glue.amazonaws.com'), | ||
managed_policies=[ | ||
# Remember to apply the Least Privilege Principle and provide only the permissions needed to the crawler | ||
iam.ManagedPolicy.from_managed_policy_arn(self, 'AmazonS3FullAccess', | ||
'arn:aws:iam::aws:policy/AmazonS3FullAccess'), | ||
iam.ManagedPolicy.from_managed_policy_arn(self, 'AWSGlueServiceRole', | ||
'arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole') | ||
]) | ||
|
||
# creating the Glue Crawler that will automatically populate our Data Catalog. Don't forget to run the crawler | ||
# as soon as the deployment finishes, otherwise our Data Catalog will be empty. Check out the README for more instructions | ||
glue.CfnCrawler(self, 'logs-crawler', | ||
name='logs-crawler', | ||
database_name=glue_database.database_input.name, | ||
role=glue_crawler_role.role_name, | ||
targets={ | ||
"s3Targets": [ | ||
{"path": f's3://{logs_bucket.bucket_name}/products'}, | ||
{"path": f's3://{logs_bucket.bucket_name}/users'} | ||
] | ||
}) | ||
|
||
# creating the Athena Workgroup to store our queries | ||
work_group = athena.CfnWorkGroup(self, 'log-auditing-work-group', | ||
name='log-auditing', | ||
work_group_configuration=athena.CfnWorkGroup.WorkGroupConfigurationProperty( | ||
result_configuration=athena.CfnWorkGroup.ResultConfigurationProperty( | ||
output_location=f"s3://{query_output_bucket.bucket_name}", | ||
encryption_configuration=athena.CfnWorkGroup.EncryptionConfigurationProperty( | ||
encryption_option="SSE_S3" | ||
)))) | ||
|
||
# creating an example query to fetch all product events by date | ||
product_events_by_date_query = athena.CfnNamedQuery(self, 'product-events-by-date-query', | ||
database=glue_database.database_input.name, | ||
work_group=work_group.name, | ||
name="product-events-by-date", | ||
query_string="SELECT * FROM \"log-database\".\"products\" WHERE \"date\" = '2024-01-19'") | ||
|
||
# creating an example query to fetch all user events by date | ||
user_events_by_date_query = athena.CfnNamedQuery(self, 'user-events-by-date-query', | ||
database=glue_database.database_input.name, | ||
work_group=work_group.name, | ||
name="user-events-by-date", | ||
query_string="SELECT * FROM \"log-database\".\"users\" WHERE \"date\" = '2024-01-22'") | ||
|
||
# creating an example query to fetch all events by the user ID | ||
all_events_by_userid_query = athena.CfnNamedQuery(self, 'all-events-by-userId-query', | ||
database=glue_database.database_input.name, | ||
work_group=work_group.name, | ||
name="all-events-by-userId", | ||
query_string="SELECT * FROM (\n" | ||
" SELECT transactionid, userid, username, domain, datetime, action FROM \"log-database\".\"products\" \n" | ||
"UNION \n" | ||
" SELECT transactionid, userid, username, domain, datetime, action FROM \"log-database\".\"users\" \n" | ||
") WHERE \"userid\" = '123'") | ||
|
||
# adjusting the resource creation order | ||
product_events_by_date_query.add_dependency(work_group) | ||
user_events_by_date_query.add_dependency(work_group) | ||
all_events_by_userid_query.add_dependency(work_group) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
{ | ||
"app": "python3 app.py", | ||
"watch": { | ||
"include": [ | ||
"**" | ||
], | ||
"exclude": [ | ||
"README.md", | ||
"cdk*.json", | ||
"requirements*.txt", | ||
"source.bat", | ||
"**/__init__.py", | ||
"**/__pycache__", | ||
"tests" | ||
] | ||
}, | ||
"context": { | ||
"@aws-cdk/aws-lambda:recognizeLayerVersion": true, | ||
"@aws-cdk/core:checkSecretUsage": true, | ||
"@aws-cdk/core:target-partitions": [ | ||
"aws", | ||
"aws-cn" | ||
], | ||
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true, | ||
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true, | ||
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true, | ||
"@aws-cdk/aws-iam:minimizePolicies": true, | ||
"@aws-cdk/core:validateSnapshotRemovalPolicy": true, | ||
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true, | ||
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true, | ||
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true, | ||
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true, | ||
"@aws-cdk/core:enablePartitionLiterals": true, | ||
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true, | ||
"@aws-cdk/aws-iam:standardizedServicePrincipals": true, | ||
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true, | ||
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true, | ||
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true, | ||
"@aws-cdk/aws-route53-patters:useCertificate": true, | ||
"@aws-cdk/customresources:installLatestAwsSdkDefault": false, | ||
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true, | ||
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true, | ||
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true, | ||
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true, | ||
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true, | ||
"@aws-cdk/aws-redshift:columnId": true, | ||
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true, | ||
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true, | ||
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true, | ||
"@aws-cdk/aws-kms:aliasNameRef": true, | ||
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true, | ||
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true, | ||
"@aws-cdk/aws-efs:denyAnonymousAccess": true, | ||
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true, | ||
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true, | ||
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true, | ||
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true, | ||
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true, | ||
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true, | ||
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true | ||
} | ||
} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/products/date=2024-01-19/2024-01-19T13:40:09.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "eb8c8930-f675-4ab2-912d-cb2970080dda","userId": "123","userName": "test user","domain": "products","dateTime": "2024-01-19T13:40:09","action": "Create New Product","transactionResult": "Success", "data": {"productId": "04dad1d4-d88f-4db7-81e1-22fec3f202c5"}} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/products/date=2024-01-19/2024-01-19T13:45:09.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "425b59c1-0a4d-4776-a213-3d5853721fb2","userId": "123","userName": "test user","domain": "products","dateTime": "2024-01-19T13:45:09","action": "Change Product","transactionResult": "Success","data": {"productId": "04dad1d4-d88f-4db7-81e1-22fec3f202c5","changedFields": {"productName": "new product name","value": 50}}} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/products/date=2024-01-19/2024-01-19T13:51:09.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "b8609277-d9cd-4dab-98ea-4219ad8a414d","userId": "456","userName": "test user 2","domain": "products","dateTime": "2024-01-19T13:51:09","action": "Delete Product","transactionResult": "Error", "data": {"productId": "04dad1d4-d88f-4db7-81e1-22fec3f202c5"}} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/products/date=2024-01-20/2024-01-20T13:51:09.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "b8609277-d9cd-4dab-98ea-4219ad8a414d","userId": "456","userName": "test user 2","domain": "products","dateTime": "2024-01-20T13:51:09","action": "Delete Product","transactionResult": "Success", "data": {"productId": "04dad1d4-d88f-4db7-81e1-22fec3f202c5"}} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/users/date=2024-01-20/2024-01-20T08:13:33.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "edd4916a-da30-4638-b0d4-ebcdbfee7042","userId": "789","userName": "test user 3","domain": "users","dateTime": "2024-01-20T08:13:33","action": "Add User","transactionResult": "Error", "data": {"newUser": {"userId": "000", "userName": "test user 4"}}} |
3 changes: 3 additions & 0 deletions
3
python/athena-s3-glue/log-samples/users/date=2024-01-22/2024-01-22T08:13:33.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{"transactionId": "5522f86a-120b-4b26-9e09-899874558b43","userId": "789","userName": "test user 3","domain": "users","dateTime": "2024-01-22T08:13:33","action": "Add User","transactionResult": "Success", "data": {"newUser": {"userId": "000", "userName": "test user 4"}}} | ||
{"transactionId": "00f46121-6218-4e7f-93f7-e186cf7659ef","userId": "123","userName": "test user","domain": "users","dateTime": "2024-01-22T08:13:40","action": "Add User","transactionResult": "Error", "data": {"newUser": {"userId": "000", "userName": "test user 4"}}} | ||
{"transactionId": "4be4b004-baaf-47c7-b0a6-86c4f81c9b4f","userId": "123","userName": "test user","domain": "users","dateTime": "2024-01-22T08:13:43","action": "Add User","transactionResult": "Error", "data": {"newUser": {"userId": "000", "userName": "test user 4"}}} |
1 change: 1 addition & 0 deletions
1
python/athena-s3-glue/log-samples/users/date=2024-01-22/2024-01-22T09:13:33.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"transactionId": "c413bd54-caef-482c-86a4-260b72742f52","userId": "000","userName": "test user 4","domain": "users","dateTime": "2024-01-22T09:13:33","action": "Change User","transactionResult": "Success", "data": {"changedFields": {"birthDate": "1970-01-01"}}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
pytest==6.2.5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
aws-cdk-lib==2.115.0 | ||
constructs>=10.0.0,<11.0.0 |
Empty file.
Empty file.
Oops, something went wrong.