s3-migrator

S3-migrator is a tool to facilitate migration of the objects between different buckets and paths on AWS S3 and also updating the state of these objects on MySQL Database.

What makes it different from other S3 tools out there?

S3-migrator isn't the fastest tool out there to use to copy your objects between different buckets, Although it's pretty fast and it can handle that seamlessly if you used it for that use case, But S3-migrator was build with the intent of moving several-millions of S3 objects between buckets or within the same bucket on different paths while keeping the production Database up-to-date with the changes done and remove old objects when it makes sure that it's both moved and updated on the database, hence reduce costs by preventing objects duplication.

If you wanna know more about the internals and the design decisions and read more about the architecture and your alternative solutions for Moving S3Objects while keeping your production database in sync with the changes, Please Have a look at our Design Document

How it compares to other S3tools?

If you decided to use s3-migrator for copying your objects without updating your database Here's what you should expect from s3-migraor.

tools	Speed while copying 1k objects between S3Buckets
s3cmd	18.60 mins
awscli	44.32 sec
s3-migrator	9.44 sec

All tests were done on my personal MacBook with the same connection. So while copying I found that s3-migrator is much, much faster than some of the other alternatives.

Features

S3-migrator moves files concurrently, So it's fast.
S3-migrator updates your database and change objects paths.
S3-migrator assumes that database is the single source of truth so It can resume work from where it was stopped.
No dependencies — There's no usage of external sources (i.e. lambdas, DynamoDB) to keep track of the files migrations; we only rely on our script and database which reduces the cost of the migration.
S3-migrator implements exponential backoffs so it's always trying to handle problems that arises due of network delays and rate-limiting.
S3-migrator is fail-safe and idempotent script, You can run it multiple times without expecting any problems.

Requirements

Python 3.x
pip
git (recommended)

Installation

There are multiple ways to install and run S3-migrator.

You can use it as your typical python tool by installing its dependencies the run it using python s3-migrator

Steps: First, make sure you have Python 3.x, pip, and git installed on your machine.

Run the following in your command prompt to install:
1. git clone https://github.com/0xMH/s3-migrator.git
2. cd s3-migrator
3. pip install -r requirements.txt

Run it as container using Docker:

docker run --name s3-migrator \
-e bucket_from="" \
-e bucket_to="" \
-e aws_access_key_id="" \
-e aws_secret_access_key="" \
-e DBUser="" \
-e DBPassword="" \
-e DBHost="" \
-e DBPort="" \
-e DATABASE="" oxmh/s3-migrator:latest

Run it on Kubernetes using Helm chart

Check the README on helm-s3-migrator

Usage

Currently S3-migrator needs these env vars in order to work:

bucket_from=""
bucket_to=""
aws_access_key_id=""
aws_secret_access_key=""
DBUser=""
DBPassword=""
DBHost=""
DATABASE=""
DBPort=""

You can either specify these env vars in .env file or export it to your shell environment. S3-migrator isn't configurable from the CLI yet (This is on my TODO tho) So it defaults to working on buckets table and bucket_name & id column.

Here's a statement that create both the table and column so you can try it on your MariaDB.

create table buckets
(
	bucket_name char(50) null,
	id int auto_increment,
	constraint buckets_pk
		primary key (id)
);

Also currently it defaults to moving objects under /image/* on one bucket to /avatar on another.

TODO:

Optimizing pagination query with Seek method because offset method become slow and painful once the OFFSET has a high value
Use Bulk load approach to load the changes to datbase, Maybe use either Load Data Infile or bulk insert Statement.
Make it configurable and Add CLI arguments
Improve Helm by adding Kubernetes secrets and read env vars from it

Contributing

Feel free to contribute by making a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
_helperScripts		_helperScripts
helm-s3-migrator		helm-s3-migrator
.env_example		.env_example
.gitignore		.gitignore
DesignDoc.md		DesignDoc.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

s3-migrator

What makes it different from other S3 tools out there?

How it compares to other S3tools?

Features

Requirements

Installation

Usage

TODO:

Contributing

About

Releases

Packages

Languages

0xMH/s3-migrator

Folders and files

Latest commit

History

Repository files navigation

s3-migrator

What makes it different from other S3 tools out there?

How it compares to other S3tools?

Features

Requirements

Installation

Usage

TODO:

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages