Configure aws cli
The aws s3api
is useful for doing advanced s3 operations, e.g. dealing with object versions.
The following explains how to set this up with our s3.cern.ch endpoint.
Setting up aws
All of the information required to set up aws-cli
can be found in the existing .s3cfg file used when using s3cmd
.
We recommmend setting up a separate profile for each openstack project:
$> aws configure --profile "'${OS_PROJECT_NAME}'"
AWS Access Key ID [None]: <project access key>
AWS Secret Access Key [None]: <project secret key>
Default region name [None]:
Default output format [None]:
Listing buckets using aws-cli
$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch s3api list-buckets
{
"Buckets": [
{
"Name": <bucket1>,
"CreationDate": <timestamp>
},
{
....
}
],
"Owner": {
"DisplayName": <owner>,
"ID": <owner id>
}
}
Deleting a bucket with versioned objects
Buckets with object versioning enabled cannot be deleted until all objects as well as all previous versions of objects have been deleted from the bucket.
We provide here a script to help user make sure all versions of their objects are deleted.
Usage:
$> ./s3-delete-all-object-versions.sh -b <bucket> [-f]
-b: bucket name to be cleaned up
-f: if omitted, the script will simply display a summary of actions. Add -f to execute them.
Copying files to S3 using aws-cli
Single file cp
The aws
tool provides a cp
command to move files to your s3 bucket:
$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 cp <file> s3://<your-bucket>/
upload: ./<file> to s3://<your-bucket>/<file>
Whole directory
Using the --recursive
flag you can transfer a whole directory at a time.
$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 cp <your-directory> s3://<your-bucket>/ --recursive
upload: <your-directory>/<file0> to s3://<your-bucket>/<file0>
upload: <your-directory>/<file1> to s3://<your-bucket>/<file1>
...
upload: <your-directory>/<fileN> to s3://<your-bucket>/<fileN>
You can the use aws ls
to check that your files have been properly uploaded:
$> aws --profile "'${OS_PROJECT_NAME}'" --endpoint-url=https://s3.cern.ch/ s3 ls s3://<your bucket>/
2019-10-25 11:31:40 <size> <file0>
2019-10-25 11:31:40 <size> <file1>
...
2019-10-25 11:31:40 <size> <fileN>
Additionally, aws cp
provides an --exclude
flag to filter files not to be transfered, the syntax is: --exclude "<regex>"
Space and Quota utilization
It is possible to use the s3api also to retrieve usage information.
aws s3api list-buckets --endpoint-url=https://s3.cern.ch/\?usage --debug 2>&1 | grep \<Usage | sed "s/^b'//" | sed "s/'$//" | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<Usage>
<Entries></Entries>
<Summary>
<QuotaMaxBytes>60473139527680</QuotaMaxBytes>
<QuotaMaxBuckets>1000</QuotaMaxBuckets>
<QuotaMaxObjCount>-1</QuotaMaxObjCount>
<QuotaMaxBytesPerBucket>-1</QuotaMaxBytesPerBucket>
<QuotaMaxObjCountPerBucket>-1</QuotaMaxObjCountPerBucket>
<TotalBytes>5082583315976</TotalBytes>
<TotalBytesRounded>5083388264448</TotalBytesRounded>
<TotalEntries>376617</TotalEntries>
</Summary>
<CapacityUsed>
<User>
<Buckets>
<Entry>
<Bucket>test-bucket</Bucket>
<Bytes>735</Bytes>
<Bytes_Rounded>4096</Bytes_Rounded>
</Entry>
...
</Buckets>
</User>
</CapacityUsed>
</Usage>
We provide here a script to retrieve usage information (source code available on GitLab).
This script assumes credentials to be configured in any way that aws cli expects,
usually ~/.aws/credentials
. In case multiple profiles are configured, it is
possible to use environment variables (as supported by AWS CLI Configuration to pick the desired one: