Skip to content

Latest commit

 

History

History

api

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

##Update Infra 15th Jan

R in AWS Lambda

AWS Lambda supports a number of different runtimes. These runtimes represent the programming languages which can be used "out-of-the-box" in Lambda. Unfortunately the R programming language, utilised by the DiAGRAM application's backend, is not one of these. However, Lambda's custom runtimes may be used to run R with Lambda.

The R package {lambdr} implements an R runtime for Lambda; providing the necessary functionality for handling the various endpoints required for accepting new input and sending responses.

The {lambdr} package, along with the backend application code (see src/) and its dependencies, are installed into a container image (see Dockerfile), which is then deployed as a Lambda function.

TinyTeX

The DiAGRAM application requires a backend which is capable of rendering and returning parameterised PDF documents via R Markdown, with parameter values provided by the input of a user of the DiAGRAM application.

To render the PDF document required by the DiAGRAM application, the backend compute will need an installation of the LaTeX document preparation system.

Typically, a user might install LaTeX via the TeXLive distribution. However, a standard TeXLive install takes upwards of 4.5Gb of storage, and although Lambda container images can be up to 10Gb in size, in the interest of minimising image build and push time, as well as minimising cold-start times, we should be attempting to keep our container image as small as possible.

TinyTeX is used in the backend to provide a minimal LaTeX installation. It is necessary to be aware of a known issue in which {tinytex} ignores the intermediate_dir argument of rmarkdown::render(). This is relevant because /tmp is the only writeable directory in Lambda, and so we must force {tinytex} to write all files there, including intermediary files (.aux, .log, .synctex.gz, ...):

# Ephemeral storage
tmp_dir = "/tmp"

# https://github.com/rstudio/rmarkdown/issues/1615
options(
  tinytex.output_dir = tmp_dir,
  tinytex.engine_args = glue::glue("'--output-directory={tmp_dir}'")
)

Multiple endpoints

There are a few different approaches to emulating multiple endpoints with AWS Lambda and AWS API Gateway. This StackOverflow question provides a good overview of some of the pros and cons of the different possible approaches.

Here, an API Gateway Lambda proxy integration is used to route all requests to a single Lambda function. Routing is then performed within R by diagramLambda::handler().

In utilising a single Lambda function for all requests, a request is more likely to hit a "warm" Lambda, in turn reducing the frequency of cold-start events.

Application code

The application code (the actual R backend), lives in the src/ directory, and is structured as an R package "{diagramLambda}".

{digramLambda} contains a bootstrap.sh and runtime.R file in inst/extdata/runtime/. These files are used to start {lambdr}'s infinite event listening loop. {lambdr} is configured by runtime.R to pass the raw event content through to diagramLambda::handler(), and to not serialise any of the responses:

lambdr::start_lambda(
  config = lambdr::lambda_config(
    serialiser = identity,
    deserialiser = identity
  )
)

Configuring {lambdr} to use the identity() function for deserialisation allows use of the custom deserialisation routine diagramLambda::gateway_payload_to_rook(). Additionally, using identity() for {lambdr}'s serialiser allows multiple different object types to be returned from the same Lambda function, and for custom serialisers to be created for returning PDF documents, PNG images, and CSV data.

Testing Locally

It is possible to test the DiAGRAM API locally. We can do so by building the Lambda container image. From the directory holding the Dockerfile:

docker build -t diagram:latest .

This image can then be run as:

docker run -p 9000:8080 --read-only=true --mount type=bind,source="/tmp/",target=/tmp diagram:latest

Note that this runs the container in read-only mode, but then mounts your local /tmp/ directory into the container. This simulates the fact that in AWS Lambda /tmp/ is the only writable directory.

Now that the Lambda container is running locally, requests can be sent to it. When deployed, Lambda sits behind API Gateway, and as such expects its requests to be in a particular format. You can see an example of what this format looks like here.

Requests can now be sent to your locally running container as, eg.:

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @src/inst/extdata/test_data/api_gateway_format.json

Deployment

Updating of the Docker image used by AWS Lambda is performed automatically by a GitHub Action when required. As a developer, you should not typically need to manually push any images to AWS ECR.

If you really must push directly to AWS ECR (perhaps for a redeployment to an entirely new environment, or if GitHub Actions is experiencing downtime and you need to apply a hotfix), you will require write-access to the correct ECR repository, and an AWS Access Key and corresponding AWS Secret Access Key (see the documentation on how to generate these). You should add these keys to a profile by running aws configure.

Once you have configured your AWS profile, you can authenticate against ECR with:

aws ecr get-login-password --region <AWS-REGION-HERE> | docker login --username AWS --password-stdin <AWS-ACCOUNT-ID-HERE>.dkr.ecr.<AWS-REGION-HERE>.amazonaws.com

Then, from the this directory you can build and push your image as:

docker build -t <AWS-REPO-NAME-HERE> .
docker tag <AWS-REPO-NAME-HERE>:latest <AWS-ACCOUNT-ID-HERE>.dkr.ecr.<AWS-REGION-HERE>.amazonaws.com/<AWS-REPO-NAME-HERE>:latest
docker push <AWS-ACCOUNT-ID-HERE>.dkr.ecr.<AWS-REGION-HERE>.amazonaws.com/<AWS-REPO-NAME-HERE>:latest