Skip to content

Using Object Storage as Backup Target

Obtain Access to the S3 Backup Instance

In addition to the main s3.cern.ch instance, a second S3 cluster is available for backup purposes and it is physically installed in a different location.

While the backup cluster provides the same type of object storage service of the main instance, several differences apply:

  • The backup cluster is purely meant for backups; Services in the need of storing objects for live data should rather use the main production instances
  • Access to the backup cluster cannot be obtained via a quota request on OpenStack
  • Administering keypairs (access, secret keys) is not possible via OpenStack
  • Increasing storage quota is not possible via OpenStack

To obtain access, change access credentials, or increase storage quota in the S3 backup cluster, please submit a ticket to the S3 Object Storage Service.

As the backup cluster is not integrated with OpenStack, available and used quota cannot be monitored through it. We provide here a script to retrieve usage information (source code available on GitLab). Please refer to Space and Quota utilization for further details.

Perform S3 to S3 Backups

This section provides instructions to make a backup of a bucket in the main S3 instance to the S3 backup cluster using rclone. It assumes access credentials for both clusters are already available and sufficient quota was granted on the S3 backup cluster.

Configure rclone

Create a rclone configuration file using what follows as an example where

  • [s3-main] is s3.cern.ch containing the source data you want to backup.
  • [s3-backup] is s3-fr-prevessin-1.cern.ch, where your data will be backed up to.
[s3-main]
type = s3
provider = Ceph
access_key_id = <ACCESS-KEY-HERE>
secret_access_key = <SECRET-KEY-HERE>
endpoint = https://s3.cern.ch:443

[s3-backup]
type = s3
provider = Ceph
access_key_id = <BACKUP-ACCESS-KEY-HERE>
secret_access_key = <BACKUP-SECRET-KEY-HERE>
endpoint = https://s3-fr-prevessin-1.cern.ch:443

The default storage path for the configuration file is ~/.config/rclone/rclone.conf. Alternatively, one can store the configuration file in an arbitraty location and use it with rclone --config <your_config_file> [command] --help

List source and destination buckets

It is possible to list already existing buckets on both ends with

$ rclone lsd <source_instance>:
$ rclone lsd <backup_instance>:

For example:

$ rclone lsd s3-main:
          -1 2021-08-12 18:43:13        -1 ebocchi
          -1 2024-04-29 13:49:31        -1 mytestbucket

$ rclone lsd s3-backup:
          -1 2021-03-10 15:26:33        -1 ebocchi-test

Prepare and Perform a Backup

Create a bucket in the S3 backup cluster as backup target:

$ rclone mkdir s3-backup:mytestbackup
$ rclone lsd s3-backup:
          -1 2021-03-10 15:26:33        -1 ebocchi-test
          -1 2024-04-29 16:29:54        -1 mytestbackup

Perform a dry run of the backup operation to make sure it is as expected:

$ rclone copy s3-main:<bucket-name> s3-backup:<bucket-name> --dry-run

Example:

$ rclone copy s3-main:mytestbucket s3-backup:mytestbackup --dry-run
2024/04/29 16:30:22 NOTICE: 0: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 3: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 4: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 5: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 7: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 8: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 9: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 1: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 6: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 2: Skipped copy as --dry-run is set (size 1M)
2024/04/29 16:30:22 NOTICE: 
Transferred:           10M / 10 MBytes, 100%, 65.568 GBytes/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:         0.1s

If you are happy with the output, drop the --dry-run flag:

$ rclone copy s3-main:mytestbucket s3-backup:mytestbackup -v
2024/04/29 16:53:03 INFO  : 0: Copied (new)
2024/04/29 16:53:03 INFO  : 1: Copied (new)
2024/04/29 16:53:03 INFO  : 2: Copied (new)
2024/04/29 16:53:03 INFO  : 3: Copied (new)
2024/04/29 16:53:03 INFO  : 5: Copied (new)
2024/04/29 16:53:03 INFO  : 6: Copied (new)
2024/04/29 16:53:03 INFO  : 7: Copied (new)
2024/04/29 16:53:03 INFO  : 4: Copied (new)
2024/04/29 16:53:03 INFO  : 9: Copied (new)
2024/04/29 16:53:03 INFO  : 8: Copied (new)
2024/04/29 16:53:03 INFO  : 
Transferred:           10M / 10 MBytes, 100%, 25.342 MBytes/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:         0.7s

Various commands (e.g., copy, sync, bisync) and options (--immutable, --checksum, --max-delete, --filter) are available in rclone to achieve different strategies. We recommend checking the official documentation at rclone.org/docs.

Advanced Features: Immutable Objects and Object Versioning

S3 Object Storage provides features to make stored objects immutable by leveraging versioning and object locks. Versioning and locks should be setup on a bucket at creation time, and not for an existing bucket. Also, versioning can be suspended but existing versions and locks will remain. Please, take extra care in considering if these features are required by your backups use case.

Versioning enables you to keep multiple variants of an object in the same bucket. When querying a versioned object without specifying a version tag, the most recent object will be returned. In order to retrieve a previous version of the object, a versionID has to be provided.

Object locks provide a mechanism to protect objects against deletion or being overwritten. Several policies can be applied, from least to most secure:

  • Legal hold: Can be enabled on a per-object basis. When enabled, objects cannot be deleted; However, it is possible to remove the legal hold and then proceed with the object deletion.
  • Governance mode: Applies to the whole bucket and defines a policy where objects can be marked for deletion, which would be enforced only after a retention period expires. For example, if we set governance mode with a retention of 30 days, a delete request will create a delete marker, but the object will be protected for 30 days. However, it is possible to bypass the governance mode (if the user is priviledged with the s3:BypassGovernanceRetention permission, which is typically the case for the bucket owner), ultimately resulting into being able to instanlty delete the object.
  • Compliance mode: Is identical to Governance mode, without the ability to bypass the lock.

For more details, please refer to upstream S3 protocol documentation:

A bucket with object versioning and locks is equivalent to an ordinary bucket and can be used as a backup target for rclone as shown above.

Configure awscli

Documentation for configuring awscli for CERN S3 can be found here.

In what follows, we configure awscli with two profiles to access the main and backup clusters:

$ aws configure --profile main
        AWS Access Key ID: <ACCESS-KEY-HERE>
        AWS Secret Access Key: <SECRET-KEY-HERE>
        Default region name [None]:
        Default output format [None]: json

$ aws configure --profile backup
        AWS Access Key ID: <BACKUP-ACCESS-KEY-HERE>
        AWS Secret Access Key: <BACKUP-SECRET-KEY-HERE>
        Default region name [None]:
        Default output format [None]: json

Creating a Bucket with Object Versioning and Locks (Compliance Mode)

Here we create a bucket to be used as backup target on the S3 backup instance. The bucket is configured to have:

  • Objects versioning
  • Object locks in compliance mode
  • A retention period of 7 days

With aws cli:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api create-bucket --bucket mytestbackup-locked --object-lock-enabled-for-bucket
$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api put-object-lock-configuration --bucket mytestbackup-locked --object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE", "Days":7}}}'

Verify the desired policy with:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api get-object-lock-configuration --bucket mytestbackup-locked
{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled",
        "Rule": {
            "DefaultRetention": {
                "Mode": "COMPLIANCE",
                "Days": 7
            }
        }
    }
}

Warning: From now on, all the objects uploaded to the bucket are protected and it will not be possible to delete them. Be careful before writing massive amounts of data and/or test objects that you may want to delete immediately after.

As an example, we put a test object to demonstrate it cannot be deleted, nor overwritten.

$ echo "This sentence cannot be deleted" > compliance_test
$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api put-object --bucket mytestbackup-locked --body compliance_test --key compliance_test
{
    "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo",
    "ETag": "\"bac80695f95f53c8b8c68ea650d085be\""
}

Verify the object was written and relevant retention parameters are set:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api head-object --bucket mytestbackup-locked --key compliance_test
{
    "AcceptRanges": "bytes", 
    "ContentType": "binary/octet-stream", 
    "ObjectLockRetainUntilDate": "2024-05-20T14:34:34.479481513Z", 
    "LastModified": "Mon, 13 May 2024 14:34:34 GMT", 
    "ContentLength": 32, 
    "ObjectLockMode": "COMPLIANCE", 
    "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
    "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
    "Metadata": {}
}

Attempt Deletion of Protected Objects

Deleting a protected object produces a Delete marker:

$  aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api delete-object  --bucket mytestbackup-locked --key compliance_test
{
    "VersionId": "w5Zvz69iNr3EKhKcrRThFeC3WtES-o5", 
    "DeleteMarker": true
}

Though, attempting to get the object will result in a 404:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api head-object --bucket mytestbackup-locked --key compliance_test

An error occurred (404) when calling the HeadObject operation: Not Found

Still, it is possible to fetch the object by specifying the desired verion ID:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api head-object --bucket mytestbackup-locked --key compliance_test --version-id G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo
{
    "AcceptRanges": "bytes", 
    "ContentType": "binary/octet-stream", 
    "ObjectLockRetainUntilDate": "2024-05-20T14:34:34.479481513Z", 
    "LastModified": "Mon, 13 May 2024 14:34:34 GMT", 
    "ContentLength": 32, 
    "ObjectLockMode": "COMPLIANCE", 
    "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
    "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
    "Metadata": {}
}

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api get-object --bucket mytestbackup-locked --key compliance_test --version-id G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo output
{
    "AcceptRanges": "bytes",
    "ContentType": "binary/octet-stream",
    "ObjectLockRetainUntilDate": "2024-05-20T14:34:34.479481513Z",
    "LastModified": "Mon, 13 May 2024 14:34:34 GMT",
    "ContentLength": 32,
    "ObjectLockMode": "COMPLIANCE",
    "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo",
    "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"",
    "Metadata": {}
}

$ cat output
This sentence cannot be deleted

If the version is unknown, it is possible to list all the available ones with:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api list-object-versions --bucket mytestbackup-locked --key compliance_test
{
    "Name": "mytestbackup-locked", 
    "Versions": [
        {
            "LastModified": "2024-05-13T14:34:34.479Z", 
            "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
            "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
            "StorageClass": "STANDARD", 
            "Key": "compliance_test", 
            "Owner": {
                "DisplayName": "Enrico Bocchi", 
                "ID": "ebocchi"
            }, 
            "IsLatest": false, 
            "Size": 32
        }
    ], 
    "MaxKeys": 1000, 
    "Prefix": "", 
    "KeyMarker": "compliance_test", 
    "DeleteMarkers": [
        {
            "Owner": {
                "DisplayName": "Enrico Bocchi", 
                "ID": "ebocchi"
            }, 
            "IsLatest": true, 
            "VersionId": "w5Zvz69iNr3EKhKcrRThFeC3WtES-o5", 
            "Key": "compliance_test", 
            "LastModified": "2024-05-13T14:37:15.330Z"
        }
    ], 
    "EncodingType": "url", 
    "IsTruncated": false, 
    "VersionIdMarker": ""
}

It is also possible to delete the Delete marker, for example in case the object was deleted accidentally. To achieve this, it is needed to issue a delete-object command by specifying the versionId of the Delete marker as target version id:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api delete-object --bucket mytestbackup-locked --key compliance_test --version-id w5Zvz69iNr3EKhKcrRThFeC3WtES-o5
{
    "VersionId": "w5Zvz69iNr3EKhKcrRThFeC3WtES-o5", 
    "DeleteMarker": true
}

As a result, the Delete marker is gone:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api list-object-versions --bucket mytestbackup-locked --key compliance_test
{
    "Name": "mytestbackup-locked", 
    "Versions": [
        {
            "LastModified": "2024-05-13T14:34:34.479Z", 
            "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
            "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
            "StorageClass": "STANDARD", 
            "Key": "compliance_test", 
            "Owner": {
                "DisplayName": "Enrico Bocchi", 
                "ID": "ebocchi"
            }, 
            "IsLatest": true, 
            "Size": 32
        }
    ], 
    "MaxKeys": 1000, 
    "Prefix": "", 
    "KeyMarker": "compliance_test", 
    "EncodingType": "url", 
    "IsTruncated": false, 
    "VersionIdMarker": ""
}

...and it is also possible to fetch the object back without the need for a verion id:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api get-object --bucket mytestbackup-locked --key compliance_test output
{
    "AcceptRanges": "bytes", 
    "ContentType": "binary/octet-stream", 
    "ObjectLockRetainUntilDate": "2024-05-20T14:34:34.479481513Z", 
    "LastModified": "Mon, 13 May 2024 14:34:34 GMT", 
    "ContentLength": 32, 
    "ObjectLockMode": "COMPLIANCE", 
    "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
    "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
    "Metadata": {}
}

$ cat output 
This sentence cannot be deleted

Attempt Overwrite of Protected Objects

Here we overwrite a protected object by replacing the it payload with the md5sum of the original content:

$ cat output  | md5sum  | cut -d ' ' -f 1 > compliance_test_md5
$ cat compliance_test_md5
bac80695f95f53c8b8c68ea650d085be

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api put-object --bucket mytestbackup-locked --body compliance_test_md5 --key compliance_test
{
    "VersionId": "cIhGVEg3yZS2KUl2fCspPO6HoAnG92t",
    "ETag": "\"58aa6314b255ecebbbf9fb9cc2b25929\""
}

The overwrite is successful and the current version of the object will return the md5sum:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api get-object --bucket mytestbackup-locked --key compliance_test /dev/stdout
bac80695f95f53c8b8c68ea650d085be

However, it is possible to list all the available object versions:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api list-object-versions --bucket mytestbackup-locked --key compliance_test
{
    "Name": "mytestbackup-locked", 
    "Versions": [
        {
            "LastModified": "2024-05-13T14:59:15.917Z", 
            "VersionId": "cIhGVEg3yZS2KUl2fCspPO6HoAnG92t", 
            "ETag": "\"58aa6314b255ecebbbf9fb9cc2b25929\"", 
            "StorageClass": "STANDARD", 
            "Key": "compliance_test", 
            "Owner": {
                "DisplayName": "Enrico Bocchi", 
                "ID": "ebocchi"
            }, 
            "IsLatest": true, 
            "Size": 33
        }, 
        {
            "LastModified": "2024-05-13T14:34:34.479Z", 
            "VersionId": "G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo", 
            "ETag": "\"bac80695f95f53c8b8c68ea650d085be\"", 
            "StorageClass": "STANDARD", 
            "Key": "compliance_test", 
            "Owner": {
                "DisplayName": "Enrico Bocchi", 
                "ID": "ebocchi"
            }, 
            "IsLatest": false, 
            "Size": 32
        }
    ], 
    "MaxKeys": 1000, 
    "Prefix": "", 
    "KeyMarker": "compliance_test", 
    "EncodingType": "url", 
    "IsTruncated": false, 
    "VersionIdMarker": ""
}

...and retrieve previous ones as needed by specifying the version id:

$ aws --profile backup --endpoint-url=https://s3-fr-prevessin-1.cern.ch s3api get-object --bucket mytestbackup-locked --key compliance_test --version-id G-ARHHwG-5mu-avRrGgvVtgJ3TbtZgo /dev/stdout
This sentence cannot be deleted

Summary of Object Versioning and Locks

When object versioning and lock in compliance mode are enabled on a bucket:

  • Deletions will not take effect immediately, rather a Deletion marker is created
  • Deleted objects can be retrieved by specifying the Version ID
  • If an object is marked for deletion, it will be eventually deleted after the defined retention period
  • A Deletion marker can be revoked by issuing a delete request and using the Version ID of the Deletion marker as a parameter
  • Overwrites are applied and the newly uploaded version become the current version of the object
  • The previous version of the object can be fetched by specifying the Version ID in the get object request