Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is 100s of GB is not easy using the Web interface. From my experience, it fails frequently. There are various third party commercial tools that claims to help people upload large files to Amazon S3 and Amazon also provides a Multipart Upload API which is most of these tools based on.
While these tools are helpful, they are not free and AWS already provides users a pretty good tool for uploading large files to S3—the open source
aws s3 CLI tool from Amazon. From my test, the
aws s3 command line tool can achieve more than 7MB/s uploading speed in a shared 100Mbps network, which should be good enough for many situations and network environments. In this post, I will give a tutorial on uploading large files to Amazon S3 with the
aws command line tool.
Install aws CLI tool
Table of Contents
Assume that you already have Python environment set up on your computer. You can install
using using the bundled installer
$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
Try to run
aws after installation. If you see output as follows, you should have installed it successfully.
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
aws <command> help
aws <command> <subcommand> help
aws: error: too few arguments
aws tool access
The quickest way to configure the AWS CLI is to run the
aws configure command:
$ aws configure
AWS Access Key ID: foo
AWS Secret Access Key: bar
Default region name [us-west-2]: us-west-2
Default output format [None]: json
Uploading large files
Lastly, the fun comes. Here, assume we are uploading the large
s3://systut-data-test/store_dir/ (that is, directory
store-dir under bucket
systut-data-test) and the bucket and directory are already created on S3. The command is:
$ aws s3 cp ./150GB.data s3://systut-data-test/store_dir/
After it starts to upload the file, it will print the progress message like
Completed 1 part(s) with ... file(s) remaining
at the beginning, and the progress message as follows when it is reaching the end.
Completed 9896 of 9896 part(s) with 1 file(s) remaining
After it successfully uploads the file, it will print a message like
upload: ./150GB.data to s3://systut-data-test/store_dir/150GB.data