How to Upload Large Files to Amazon S3 with AWS CLI

ByEric Ma Nov 29, 2015Aug 30, 2020

Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is 100s of GB is not easy using the Web interface. From my experience, it fails frequently. There are various third party commercial tools that claims to help people upload large files to Amazon S3 and Amazon also provides a Multipart Upload API which is most of these tools based on.

While these tools are helpful, they are not free and AWS already provides users a pretty good tool for uploading large files to S3—the open source aws s3 CLI tool from Amazon. From my test, the aws s3 command line tool can achieve more than 7MB/s uploading speed in a shared 100Mbps network, which should be good enough for many situations and network environments. In this post, I will give a tutorial on uploading large files to Amazon S3 with the aws command line tool.

Install aws CLI tool

Assume that you already have Python environment set up on your computer. You can install aws tools ~~using pip or~~ using the bundled installer

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

Try to run aws after installation. If you see output as follows, you should have installed it successfully.

$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: too few arguments

Configure `aws` tool access

The quickest way to configure the AWS CLI is to run the aws configure command:

$ aws configure
AWS Access Key ID: foo
AWS Secret Access Key: bar
Default region name [us-west-2]: us-west-2
Default output format [None]: json

Here, your AWS Access Key ID and AWS Secret Access Key can be found in Your Security Credentials on the AWS Console.

Uploading large files

Lastly, the fun comes. Here, assume we are uploading the large ./150GB.data to s3://systut-data-test/store_dir/ (that is, directory store-dir under bucket systut-data-test) and the bucket and directory are already created on S3. The command is:

$ aws s3 cp ./150GB.data s3://systut-data-test/store_dir/

After it starts to upload the file, it will print the progress message like

Completed 1 part(s) with ... file(s) remaining

at the beginning, and the progress message as follows when it is reaching the end.

Completed 9896 of 9896 part(s) with 1 file(s) remaining

After it successfully uploads the file, it will print a message like

upload: ./150GB.data to s3://systut-data-test/store_dir/150GB.data

aws has more commands to operate files on S3. I hope this tutorial helps you start with it. Check the manual for more details.

How to detect whether Linux runs in UEFI or BIOS mode inside the Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to detect whether Linux runs in UEFI or BIOS mode inside the Linux itself without needed to boot the the management console of the mother board? You can detect whether Linux runs in EFI mode by checking whether /sys/firmware/efi exist. In bash, you can test by [ -d /sys/firmware/efi/ ] This technique is used…

Network

How to disable IPv6 on Linux (Old Version Kernels)

ByEric Ma Jul 13, 2013Aug 9, 2020

Update: this only works on older Linux kernels (e.g. Fedora 12). For newer kernels, please see How to disabling IPv6 on Linux IPv6 is enabled by default on most Linux distros. However, IPv4 is still the mostly used network and IPv6 is not used in many environment. This post introduces how to disable IPv6 support on…

Setting up a VPN over SSH

ByEric Ma Mar 24, 2018Sep 18, 2022

SSH tunnel and port forwarding is great and convenient to use. But is it possible to set up a VPN like connection over SSH? If you are on Linux or Mac, you can use sshuttle: https://github.com/apenwarr/sshuttle If you are on Windows, you can use ProxyCap: http://www.proxycap.com/index.html Both are great software. Read more: Setting Up VPN-like…

How to merge multiple jpg images to a pdf on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

I have multiple jpg images as files like 001.jpg, 002.jpg … How to merge multiple jpg images to a pdf on Linux? convert is your good friend: convert *.jpg output.pdf Read more: How to merge multiple PDF files to a PDF on Linux? How to convert .pptx slides to .jpg or .png images on Linux…

How to install the MATE fork of Gnome 2 on Fedora 17?

ByQ A Mar 24, 2018Mar 24, 2018

I miss Gnome 2. How to install the MATE fork of Gnome 2 on Fedora 17? MATE is already included into Fedora 17’s repository and is an official feature of Fedora 18. To install MATE on Fedora 17 # yum install @mate-desktop To install softwares usually needed: # yum install mate-media mate-screensaver mate-system-monitor mate-power-manager mate-utils…

Software

Make Better Decisions for Your Businesses with Data Visualization

ByJoseph Macwan Feb 24, 2017Aug 30, 2020

In today’s time, data visualization has become a significant part of the success story of an organization. With the help of right techniques, visualizing data can reveal insights which the management staff can use in their decision-making in order to make sound data-driven decisions. Mapping software is among the robust data visualization tools that you…

8 Comments

Eric Ma says:

Dec 16, 2015 at 4:57 pm

To upload a directory recursively, you may use `aws s3 sync`. For example, to upload current directory to my-bucket bucket under dir my-dir:

$ aws s3 sync . s3://my-bucket/my-dir/

Reply
Pedro says:

Jun 25, 2016 at 12:58 am

Hey Eric, is there a parameter available for the above command that would allow me to enforce TLS 1.2 encryption in-transit?

Reply
1. Eric Z Ma says:
  
  Jun 30, 2016 at 11:16 am
  
  I am not aware of such one. You may need to dig into the source code of aws-cli which is available at https://github.com/aws/aws-cli to investigate or make patch to enforce TLS 1.2.
  
  Reply
Nhu says:

Aug 12, 2016 at 1:44 pm

how do I sync between an sftp location and s3 bucket directly?

Reply
1. Eric Z Ma says:
  
  Aug 19, 2016 at 4:25 pm
  
  You may consider a solution like this:
  
  1. Mount the sftp location by sshfs http://www.systutorials.com/1505/mounting-remote-folder-through-ssh/ to a local directory.
  
  2. Use the tool in this post to upload the file to sync the local directory (mounted the sftp location) with your S3 bucket.
  
  Reply
sal says:

Nov 28, 2016 at 7:33 pm

What happens when a large file upload fails?? This is not covered.
I’ve been getting segfaults using the straight cp command, and re-running it will start again from the beginning. On large files this can mean days wasted.

Reply
1. Andy says:
  
  Mar 16, 2019 at 6:02 am
  
  Stumbled upon this while looking for solutions to upload large files.
  Check this link: https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/
  If your cp process keeps dying, you may want to implicitly break it apart with the lower level s3api command set.
  
  Reply
Narendra says:

Apr 4, 2020 at 7:44 am

How do i upload a image file from my local folder to s3 bucket via command prompt.

Please help to provide CLI commands.

Reply

Install aws CLI tool

Configure aws tool access

Uploading large files

Similar Posts

8 Comments

Leave a Reply Cancel reply

Configure `aws` tool access