Configure aws cli

The aws s3api is useful for doing advanced s3 operations, e.g. dealing with object versions. The following explains how to set this up with our s3.cern.ch endpoint.

Setting up aws

All of the information required to set up aws-cli can be found in the existing .s3cfg file used when using s3cmd.

We recommend setting up a separate profile for each openstack project:

$> yum install awscli

$> aws configure --profile "'${OS_PROJECT_NAME}'"
AWS Access Key ID [None]: <project access key> 
AWS Secret Access Key [None]: <project secret key>
Default region name [None]:
Default output format [None]:

Listing buckets using `aws-cli`

$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch s3api list-buckets
{
  "Buckets": [
     {
         "Name": <bucket1>,
         "CreationDate": <timestamp> 
     },
     {
       ....
     }
   ],
   "Owner": {
        "DisplayName": <owner>,
        "ID": <owner id>
    }

}

Deleting a bucket with versioned objects

Buckets with object versioning enabled cannot be deleted until all objects as well as all previous versions of objects have been deleted from the bucket.

We provide here a script to help user make sure all versions of their objects are deleted.

Usage:

$> ./s3-delete-all-object-versions.sh -b <bucket> [-f]
   -b: bucket name to be cleaned up
   -f: if omitted, the script will simply display a summary of actions. Add -f to execute them.

Copying files to S3 using `aws-cli`

Single file cp

The aws tool provides a cp command to move files to your s3 bucket:

$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 cp <file> s3://<your-bucket>/
upload: ./<file> to s3://<your-bucket>/<file>

Whole directory

Using the --recursive flag you can transfer a whole directory at a time.

$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 cp <your-directory> s3://<your-bucket>/ --recursive
upload: <your-directory>/<file0> to s3://<your-bucket>/<file0>
upload: <your-directory>/<file1> to s3://<your-bucket>/<file1>
...
upload: <your-directory>/<fileN> to s3://<your-bucket>/<fileN>

You can the use aws ls to check that your files have been properly uploaded:

$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 ls s3://<your bucket>/
2019-10-25 11:31:40          <size> <file0>
2019-10-25 11:31:40          <size> <file1>
...
2019-10-25 11:31:40          <size> <fileN>

Additionally, aws cp provides an --exclude flag to filter files not to be transferred, the syntax is: --exclude "<regex>"

Space and Quota utilization

It is possible to use the s3api also to retrieve usage information.

aws s3api list-buckets --endpoint-url=https://s3.cern.ch/\?usage --debug 2>&1 | grep \<Usage | sed "s/^b'//" | sed "s/'$//" | xmllint --format -

<?xml version="1.0" encoding="UTF-8"?>
<Usage>
  <Entries></Entries>
  <Summary>
    <QuotaMaxBytes>60473139527680</QuotaMaxBytes>
    <QuotaMaxBuckets>1000</QuotaMaxBuckets>
    <QuotaMaxObjCount>-1</QuotaMaxObjCount>
    <QuotaMaxBytesPerBucket>-1</QuotaMaxBytesPerBucket>
    <QuotaMaxObjCountPerBucket>-1</QuotaMaxObjCountPerBucket>
    <TotalBytes>5082583315976</TotalBytes>
    <TotalBytesRounded>5083388264448</TotalBytesRounded>
    <TotalEntries>376617</TotalEntries>
  </Summary>
  <CapacityUsed>
    <User>
      <Buckets>
        <Entry>
          <Bucket>test-bucket</Bucket>
          <Bytes>735</Bytes>
          <Bytes_Rounded>4096</Bytes_Rounded>
        </Entry>
        ...
      </Buckets>
    </User>
  </CapacityUsed>
</Usage>

We provide here a script to retrieve usage information (source code available on GitLab).

This script assumes credentials to be configured in any way that aws cli expects, usually ~/.aws/credentials. In case multiple profiles are configured, it is possible to use environment variables (as supported by AWS CLI Configuration to pick the desired one:

AWS_PROFILE=cern_s3 ./s3_usage_check -endpoint https://s3.cern.ch -json

Useful links

AWS reference manual