How to Upload Large Files to Amazon S3 with AWS CLI

ByEric Ma Nov 29, 2015Aug 30, 2020

Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is 100s of GB is not easy using the Web interface. From my experience, it fails frequently. There are various third party commercial tools that claims to help people upload large files to Amazon S3 and Amazon also provides a Multipart Upload API which is most of these tools based on.

While these tools are helpful, they are not free and AWS already provides users a pretty good tool for uploading large files to S3—the open source aws s3 CLI tool from Amazon. From my test, the aws s3 command line tool can achieve more than 7MB/s uploading speed in a shared 100Mbps network, which should be good enough for many situations and network environments. In this post, I will give a tutorial on uploading large files to Amazon S3 with the aws command line tool.

Install aws CLI tool

Assume that you already have Python environment set up on your computer. You can install aws tools ~~using pip or~~ using the bundled installer

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

Try to run aws after installation. If you see output as follows, you should have installed it successfully.

$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: too few arguments

Configure `aws` tool access

The quickest way to configure the AWS CLI is to run the aws configure command:

$ aws configure
AWS Access Key ID: foo
AWS Secret Access Key: bar
Default region name [us-west-2]: us-west-2
Default output format [None]: json

Here, your AWS Access Key ID and AWS Secret Access Key can be found in Your Security Credentials on the AWS Console.

Uploading large files

Lastly, the fun comes. Here, assume we are uploading the large ./150GB.data to s3://systut-data-test/store_dir/ (that is, directory store-dir under bucket systut-data-test) and the bucket and directory are already created on S3. The command is:

$ aws s3 cp ./150GB.data s3://systut-data-test/store_dir/

After it starts to upload the file, it will print the progress message like

Completed 1 part(s) with ... file(s) remaining

at the beginning, and the progress message as follows when it is reaching the end.

Completed 9896 of 9896 part(s) with 1 file(s) remaining

After it successfully uploads the file, it will print a message like

upload: ./150GB.data to s3://systut-data-test/store_dir/150GB.data

aws has more commands to operate files on S3. I hope this tutorial helps you start with it. Check the manual for more details.

Tutorial

How to cat a single file’s content from a tar without unpacking it on Linux?

ByDavid Yang Mar 24, 2018Aug 9, 2020

How to cat a single file’s content from a tar without unpacking it on Linux? For example, I know there is a file README.txt in a tar tools.tar.gz . How to cat the content of README.txt out? You can do this by using a combination of tar‘s options and arguments. -O, –to-stdout Extract files to…

How to find the disk space left for a file on Linux

ByQ A Mar 24, 2018

How to find the disk space left for a file on Linux? For example, a program may append data to a file: /mnt/logs/app-log.log How to find which partition the app-log.log is on and how much disk space left on that partition? Use this command: $ df -B1 /mnt/logs/app-log.log | tail -1 | cut -d’ ‘…

Tutorial

Essential Gnome Shell Extensions for Gnome 3 Users

ByEric Ma Oct 22, 2016Aug 30, 2020

Gnome Shell has a clean design. But many users want to get more from the desktop environment. Gnome 3’s extension system can help users customize the Gnome Shell’s look greatly. In this post, we summarize 6 extensions we considered essential to make Gnome Shell great. AlternateTab Make Alt-Tab “classic” instead of grouping windows by application….

Linux | Software

Vim as Thunderbird’s External Editor in Linux

ByEric Ma Jul 13, 2013Oct 22, 2013

Vim is an excellent editor which I use every day. Thunderbird is a nice email application. However, Thunderbird’s integrated editor is not efficient enough to a Vim user. Why not use Vim as Thunderbird’s editor? In this tutorial, we will introduce how to integrate Vim with Thunderbird together in Linux. Install the “External Editor” Thunderbird…

Linux

Unified Linux Login and Home Using OpenLDAP and NFS

ByEric Ma Sep 25, 2013Mar 25, 2023

In this post, how to unified Linux login and home directory using OpenLDAP and NFS/automount will be introduced. 0. System environment This solution is tested on Fedora 12 systems and CentOS 5. LDAP and NFS server: IP: 10.0.0.2 OS: Fedora 12 x86_64 ldap base dn: “dc=lgcpu1″ Clients: IP: 10.0.0.1/24 OS: Fedora 12 x86_64 1. LDAP…

How to Import a CMake project into Eclipse CDT?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to Import a CMake project into Eclipse CDT? For example, the current CMake project source directory is ./src. mkdir ./cdt && cd ./cdt && cmake -G “Eclipse CDT4 – Unix Makefiles” ../src cmake will generate a Eclipse project in ./cdt and you can open it in Eclipse. Read more: How to import a googlecode…

8 Comments

Eric Ma says:

Dec 16, 2015 at 4:57 pm

To upload a directory recursively, you may use `aws s3 sync`. For example, to upload current directory to my-bucket bucket under dir my-dir:

$ aws s3 sync . s3://my-bucket/my-dir/

Reply
Pedro says:

Jun 25, 2016 at 12:58 am

Hey Eric, is there a parameter available for the above command that would allow me to enforce TLS 1.2 encryption in-transit?

Reply
1. Eric Z Ma says:
  
  Jun 30, 2016 at 11:16 am
  
  I am not aware of such one. You may need to dig into the source code of aws-cli which is available at https://github.com/aws/aws-cli to investigate or make patch to enforce TLS 1.2.
  
  Reply
Nhu says:

Aug 12, 2016 at 1:44 pm

how do I sync between an sftp location and s3 bucket directly?

Reply
1. Eric Z Ma says:
  
  Aug 19, 2016 at 4:25 pm
  
  You may consider a solution like this:
  
  1. Mount the sftp location by sshfs http://www.systutorials.com/1505/mounting-remote-folder-through-ssh/ to a local directory.
  
  2. Use the tool in this post to upload the file to sync the local directory (mounted the sftp location) with your S3 bucket.
  
  Reply
sal says:

Nov 28, 2016 at 7:33 pm

What happens when a large file upload fails?? This is not covered.
I’ve been getting segfaults using the straight cp command, and re-running it will start again from the beginning. On large files this can mean days wasted.

Reply
1. Andy says:
  
  Mar 16, 2019 at 6:02 am
  
  Stumbled upon this while looking for solutions to upload large files.
  Check this link: https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/
  If your cp process keeps dying, you may want to implicitly break it apart with the lower level s3api command set.
  
  Reply
Narendra says:

Apr 4, 2020 at 7:44 am

How do i upload a image file from my local folder to s3 bucket via command prompt.

Please help to provide CLI commands.

Reply

Install aws CLI tool

Configure aws tool access

Uploading large files

Similar Posts

8 Comments

Leave a Reply Cancel reply

Configure `aws` tool access