Skip to content

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.

License

Notifications You must be signed in to change notification settings

widdix/s3-getobject-accelerator

Repository files navigation

S3 GetObject Accelerator

Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.

We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: {concurrency: 64}.

Installation

npm i s3-getobject-accelerator

Examples

Compact

const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');

pipeline(
  download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
  createWriteStream('/tmp/test'),
  (err) => {
    if (err) {
      console.error('something went wrong', err);
    } else {
      console.log('done');
    }
  }
);

More verbose

Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:

const {createWriteStream} = require('node:fs');
const {download} = require('s3-getobject-accelerator');

const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});

d.on('part:downloading', ({partNo}) => {
  console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
  console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
  console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
  console.log('part written to disk', partNo);
});

d.meta((err, metadata) => {
  if (err) {
    console.error('something went wrong', err);
  } else {
    if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
      console.error('file is larger than 1 TiB');
    } else {
      d.file('/tmp/test', (err) => {
        if (err) {
          console.error('something went wrong', err);
        } else {
          console.log('done');
        }
      });
    }
  }
});

API

download(s3source, options)

AWS credentials & region

AWS credentials are fetched in the following order:

  1. options.v2AwsSdkCredentials
  2. Environment variables
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN (optional)
  1. IMDSv2

AWS region is fetched in the following order:

  1. Environment variable AWS_REGION
  2. IMDSv2

Considerations

About

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.

Topics

Resources

License

Stars

Watchers

Forks