Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(typescript/codepipeline-glue-deploy): Revamped a dated prescriptive guidance CDK code #1025

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/build-pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,12 @@ jobs:
echo "- $dir"
done

# Change to language directory
cd ./${{ matrix.language }}

# install CDK CLI from npm if not typescript, so that npx can find it later
# ts will use the one from the particular cdk app
if [[ ${{ matrix.language }} != 'typescript' ]]; then
cd ./${{ matrix.language }}
npm install -g aws-cdk
fi

Expand Down
41 changes: 41 additions & 0 deletions typescript/codepipeline-glue-deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# AWS Glue job with an AWS CodePipeline CI/CD pipeline

## How to Run CDK TypeScript project

The `cdk.json` file tells the CDK Toolkit how to execute your app.

### Useful commands

* `aws configure` configure access to your AWS account
* `npm run watch` watch for changes and compile
* `npm run test` perform the jest unit tests
* `cdk deploy --parameters glueJob="Glue Job Name"` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk synth` emits the synthesized CloudFormation template

## About Pattern

This pattern demonstrates how you can integrate Amazon Web Services (AWS) CodeCommit and AWS CodePipeline with AWS Glue, and use AWS Lambda to launch jobs as soon as a developer pushes their changes to a remote AWS CodeCommit repository.

When a developer submits a change to an extract, transform, and load (ETL) repository and pushes the changes to AWS CodeCommit, a new pipeline is invoked. The pipeline initiates a Lambda function that launches an AWS Glue job with these changes. The AWS Glue job performs the ETL task.

This solution is helpful in the situation where businesses, developers, and data engineers want to launch jobs as soon as changes are committed and pushed to the target repositories. It helps achieve a higher level of automation and reproducibility, therefore avoiding errors during the job launch and lifecycle.

![alt text](image.png)

The process consists of these steps:

1. The developer or data engineer makes a modification in the ETL code, commits, and pushes the change to AWS CodeCommit.
2. The push initiates the pipeline.
3. The pipeline initiates a Lambda function, which calls codecommit:GetFile on the repository and uploads the file to Amazon Simple Storage Service (Amazon S3).
4. The Lambda function launches a new AWS Glue job with the ETL code.
5. The Lambda function finishes the pipeline.

### Automation and scale

The sample attachment demonstrates how you can integrate AWS Glue with AWS CodePipeline. It provides a baseline example that you can customize or extend for your own use.

### References
* [Adding jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-job.html)
* [Invoke an AWS Lambda function in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/actions-invoke-lambda-function.html)
* [Source action integrations in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/integrations-action-type.html#integrations-source)
6 changes: 6 additions & 0 deletions typescript/codepipeline-glue-deploy/bin/app.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env node
import { App } from 'aws-cdk-lib';
import { CodepipelineGlueDeployStack } from '../lib/codepipeline-glue-deploy-stack';

const app = new App();
new CodepipelineGlueDeployStack(app, 'CodepipelineGlueDeploy', {});
57 changes: 57 additions & 0 deletions typescript/codepipeline-glue-deploy/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"app": "npx ts-node --prefer-ts-exts bin/app.ts",
"watch": {
"include": [
"**"
],
"exclude": [
"README.md",
"cdk*.json",
"**/*.d.ts",
"**/*.js",
"tsconfig.json",
"package*.json",
"yarn.lock",
"node_modules",
"test"
]
},
"context": {
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
"@aws-cdk/core:checkSecretUsage": true,
"@aws-cdk/core:target-partitions": [
"aws",
"aws-cn"
],
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
"@aws-cdk/aws-iam:minimizePolicies": true,
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
"@aws-cdk/core:enablePartitionLiterals": true,
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
"@aws-cdk/aws-route53-patters:useCertificate": true,
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
"@aws-cdk/aws-redshift:columnId": true,
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
"@aws-cdk/aws-kms:aliasNameRef": true,
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true
}
}
7 changes: 7 additions & 0 deletions typescript/codepipeline-glue-deploy/etl/etl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from awsglue.context import GlueContext
from awsglue.transforms import *
from pyspark.context import SparkContext

glueContext = GlueContext(SparkContext.getOrCreate())

print('glueContext:', glueContext)
Binary file added typescript/codepipeline-glue-deploy/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
import json
import os
import base64
from os.path import join

import boto3

s3 = boto3.client('s3')
glue = boto3.client('glue')
pipeline = boto3.client('codepipeline')
codecommit = boto3.client('codecommit')


def lambda_handler(event, context):
# Extract relevant information from the CodePipeline event
job = event['CodePipeline.job']
try:
data = job['data']
config = data['actionConfiguration']['configuration']
user_params = json.loads(config['UserParameters'])

print(json.dumps(event))

input_artifacts = data['inputArtifacts']
source_code_artifact = input_artifacts[0]

# Get the S3 bucket and key for the source code artifact
artifact_bucket = source_code_artifact['location']['s3Location']['bucketName']
artifact_key = source_code_artifact['location']['s3Location']['objectKey']
filename = os.getenv('FILENAME')
file_key = join(artifact_key, filename)
commit_id = source_code_artifact['revision']

repository_name = os.getenv('REPOSITORY_NAME')
print('repository_name', repository_name)

# Retrieve the file content from CodeCommit
codecommit_resp = codecommit.get_file(
repositoryName=repository_name,
commitSpecifier=commit_id,
filePath=filename
)
print('codecommit_resp', codecommit_resp)

# Upload the file to S3
s3_resp = s3.put_object(
Bucket=artifact_bucket,
Key=file_key,
Body=codecommit_resp['fileContent']
)
print('s3_resp', s3_resp)

# Check the S3 upload status
s3_status_code = s3_resp['ResponseMetadata']['HTTPStatusCode']
if s3_status_code != 200:
raise Exception(f'Failed to send file to S3. StatusCode={s3_status_code}')

# Construct the S3 script location
s3_script_location = f's3://{artifact_bucket}/{file_key}'

# Construct the Glue job name
glue_job_name_id = artifact_key.split('/')[-1:][0]
glue_job_name = f'{user_params["glue_job_name"]}_{glue_job_name_id}'
print('glue_job_named:', glue_job_name)

# Set additional Glue job arguments if provided
default_arguments = {}
if 'additional_python_modules' in user_params:
default_arguments['--additional-python-modules'] = user_params['additional_python_modules']

# Create the Glue job
create_job_resp = glue.create_job(
Name=glue_job_name,
Role=user_params['glue_role'],
Command={
'Name': 'glueetl',
'ScriptLocation': s3_script_location
},
DefaultArguments=default_arguments,
GlueVersion='4.0'
)
print('create_job_resp:', create_job_resp)

# Start the Glue job run
start_job_run_resp = glue.start_job_run(
JobName=create_job_resp['Name'],
Arguments={
}
)
print('start_job_run_resp:', start_job_run_resp)

# Report the successful job execution to CodePipeline
print('submitting successful job')
pipeline.put_job_success_result(jobId=job['id'])
except Exception as e:
# Report the failed job execution to CodePipeline
print('submitting unsuccessful job: ' + str(e))
pipeline.put_job_failure_result(
jobId=job['id'],
failureDetails={
'type': 'JobFailed',
'message': str(e),
'externalExecutionId': context.aws_request_id
}
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
import { CfnParameter, Stack, StackProps, RemovalPolicy } from 'aws-cdk-lib';
import * as codecommit from 'aws-cdk-lib/aws-codecommit';
import { Artifact, Pipeline, PipelineType } from 'aws-cdk-lib/aws-codepipeline';
import { Function, Code, Runtime } from 'aws-cdk-lib/aws-lambda';
import { Construct } from 'constructs';
import { CodeCommitSourceAction, LambdaInvokeAction } from 'aws-cdk-lib/aws-codepipeline-actions';
import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3';
import { Key } from 'aws-cdk-lib/aws-kms';
import { Effect, PolicyStatement, Role, ServicePrincipal } from 'aws-cdk-lib/aws-iam';

export class CodepipelineGlueDeployStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);

// Create a CloudFormation parameter for the Glue job name to use when creating
const glueJob = new CfnParameter(this, 'glueJob', {
type: 'String',
description: 'The name of the Glue job',
});

// Create a CodeCommit repository for the ETL code and upload
// code from the etl directory
const etlRepository = new codecommit.Repository(this, 'EtlRepository', {
repositoryName: 'EtlRepository',
code: codecommit.Code.fromDirectory('etl/'),
description: 'EtlRepository'
});

// Create a KMS key for encrypting the pipeline artifact store
// with key rotation enabled
const pipelineArtifactStoreEncryptionKey = new Key(this, 'pipelineArtifactStoreEncryptionKey', {
removalPolicy: RemovalPolicy.DESTROY,
enableKeyRotation: true
});

// Create an S3 bucket for the pipeline artifact store
// using the encryption key we just created
// with server-side encryption enabled
// and server access logs enabled for the bucket
const pipelineArtifactStoreBucket = new Bucket(this, 'XXXXXXXXXXXXXXXXXXXXXXXXXXX', {
removalPolicy: RemovalPolicy.DESTROY,
encryption: BucketEncryption.KMS,
encryptionKey: pipelineArtifactStoreEncryptionKey,
serverAccessLogsPrefix: 'access-logs',
enforceSSL: true
});

// Create a Glue role so that we can allow lambda to pass
// to glue ETL jobs that it creates
const glueRole = new Role(this, 'GlueRole', {
assumedBy: new ServicePrincipal('glue.amazonaws.com'),
});

// Add the necessary permissions to the Glue role to create and start ETL Jobs
glueRole.addToPrincipalPolicy(
new PolicyStatement({
actions: [
'glue:CreateJob',
'glue:StartJobRun'
],
effect: Effect.ALLOW,
resources: ['*']
})
);

// Grant the Glue role the ability to encrypt and decrypt the pipeline artifact store encryption key
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(glueRole)
// Grant the Glue role the ability to read and write to the pipeline artifact store bucket
pipelineArtifactStoreBucket.grantReadWrite(glueRole);


// Create a Lambda function to create Glue jobs based on the files
// in the ETL code repository
const lambda = new Function(this, 'lambda', {
code: Code.fromAsset('lambda_etl_launch'),
handler: 'lambda_etl_launch.lambda_handler',
runtime: Runtime.PYTHON_3_12,
environment: {
'REPOSITORY_NAME': etlRepository.repositoryName,
'FILENAME': 'etl.py'
}
});

// Add the necessary permissions to the Lambda role to pass the glue role
lambda.role?.addToPrincipalPolicy(
new PolicyStatement({
actions: [
'iam:PassRole'
],
effect: Effect.ALLOW,
resources: [glueRole.roleArn]
})
);
// Add the necessary permissions to the Lambda role to read and write from the pipeline artifact store
pipelineArtifactStoreBucket.grantReadWrite(lambda.role!);
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(lambda.role!);

// Create a pipeline artifact store
const pipelineArtifactStore = new Artifact();

// Create a CodePipeline pipeline
const pipeline = new Pipeline(this, 'Pipeline', {
pipelineName: 'pipeline',
artifactBucket: pipelineArtifactStoreBucket,
enableKeyRotation: true,
pipelineType: PipelineType.V2,
stages: [
{
stageName: 'Source',
actions: [
new CodeCommitSourceAction({
actionName: 'Source',
repository: etlRepository,
branch: 'main',
output: pipelineArtifactStore,
})
]
},
{
stageName: 'Deploy',
actions: [
new LambdaInvokeAction({
actionName: 'Deploy',
lambda: lambda,
inputs: [pipelineArtifactStore],
userParameters: {
glue_job_name: glueJob.valueAsString,
glue_role: glueRole.roleName
}
})
]
},
]
});

// Grant the pipeline role the ability to pull from the ETL repository
etlRepository.grantPull(pipeline.role);
// Grant the pipeline role the ability to invoke the Lambda function
lambda.grantInvoke(pipeline.role);
// Grant the pipeline role the ability to encrypt and decrypt the pipeline artifact store encryption key
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(pipeline.role);


}
}
Loading
Loading