Bookmark this page

Analyzing Object Storage Technologies

Objectives

After completing this section, you should be able to analyze and compare the common technologies for general object storage use cases.

Comparing Object Storage

Domain operators can advise cloud users regarding the different features of object storage APIs and back ends, which may include mixing front and back ends. A domain operator would not be required to configure these back ends, but would understand which configuration has been implemented, and what is appropriate for different application use cases.

Red Hat OpenStack Platform configures Ceph as the back end for Swift by default, however organizations with an existing investment in other technologies may choose a different configuration. This section will compare the features of three popular object storage technologies.

Comparing Swift, Ceph RGW, and Amazon S3

The choice of object storage technology can be affected by several factors, including cost, existing storage investment, and application requirements. However, the decision should also be based on an understanding of the technologies and their features as they pertain to business needs.

The following table compares a few of the features of Swift, Ceph RADOS Gateway, and Amazon Simple Storage Service (S3). Explanations and use cases are presented following the feature table.

Table 6.1. Storage Features

FeatureSwiftCeph RGWAmazon S3
CodebasePythonMostly C++Unknown
Consistency modelEventual consistencyStrong consistencyRead-after-write for creates, eventually consistent for modifications and deletes
AccessRESTful APIRESTful APIRESTful API
ReplicationYYY
Object expirationYYY
EncryptionYYY

Comparing Data Consistency Models

Eventual consistency offers low latency but may reply to read requests with stale data since all nodes of the database may not yet have the updated data. Eventual consistency is a theoretical guarantee that, provided no new updates to an entity are made, all reads of the entity will eventually return the last updated value. With eventual consistency, replicas are always available to read, but some replicas may be inconsistent with the latest write on the originating node, at any particular moment in time. Both Amazon S3 and Swift use an eventually consistent model, which scales well for massive quantities of data, and multiple geographic regions.

Strong Consistency offers up-to-date data but at the cost of high latency. With strong consistency, also called immediate consistency, data viewed immediately after an update will be consistent for all observers of the entity. To have strong consistency, developers compromise on application's scalability and performance because data must be locked during update and replication processes to ensure that no other processes can update the same data until this operation is complete. Ceph clusters are based on RADOS and use a strongly consistent model, so any changes to data must be replicated to all nodes before they are visible. As the quantity of data and the number of regions increases, replication can take longer to complete, resulting in delays.

Developers have to chose which consistency model is the best fit for their application. For example, many finance application procedures require strong consistency, while social media status updates do not. An application with strong consistency must wait for acknowledgments from all updates or replicas before it is allowed to continue to the next procedure, which results in a noticeable delay. Procedures using eventual consistency can continue immediately with other activities, knowing that consistency will be reached at some point.

Comparing Replication Support

All three technologies use replication to ensure high availability of data.

Ceph uses the controlled replication under scalable hashing (CRUSH) algorithm to control replication.

Swift has two classes of replication; one for databases containing accounts and containers, and one for objects. Objects are replicated using a push model, where a node is responsible for ensuring that any data it contains that should be replicated, is available on the appropriate remote nodes.

Amazon S3 allows users to configure replication at the bucket level, either within a region (Same-Region Replication (SRR)) or between regions (Cross-Region Replication (CRR)).

Comparing Encryption Support

Swift has no encryption facility exposed to users, however it can encrypt data on the server side before storing it. To ensure your data is protected in transit as well as at rest, you should encrypt the data before sending it to Swift.

Ceph has three options for encryption, however the default encryption is not for general use, only troubleshooting. The remaining encryption options are both on the server side, with the keys passed in the request, or stored in Barbican.

Amazon S3 includes support for both server-side and client-side encryption. For server-side encryption you can use S3 managed keys, automatically rotated AES-256 keys, or customer managed keys. On the client-side you can use a customer master key, or embed the key in the application accessing the data.

Comparing Object Expiration Support

To have objects expire at some point in the future, Swift supports setting the expiry time to a given date and time, or after a fixed number of seconds. The swift-object-expirer daemon monitors object expiry times, then deletes the objects when the limit is reached.

Amazon S3 and Ceph RGW also support automatic expiry of objects.

Comparing Object Storage Limits

The maximum size of a single uploaded object in Swift is 5 GiB. Swift also supports segmentation, allowing a large object to be divided into smaller chunks, then be uploaded in parallel. Swift supports quotas by user or by container, with containers being restricted to the total size of all objects, or by the number of objects.

Amazon S3 has a maximum object size of 5 TiB. Users are limited to 100 or 1000 buckets, but have no limit on the total storage consumed.

Ceph RGW now includes multi-part upload, allowing for virtually unlimited object sizes. Multi-part is restricted to a maximum of 10,000 parts. Ceph storage quotas can be configured by user or by container, however there are no limits on the number of containers.

Summarizing the Object Storage Technology Comparison

Swift has high resilience and availability, allowing the addition or replacement of nodes without impact. Swift is good choice for Red Hat OpenStack Platform deployments spanning multiple sites and regions, and with large amounts of data.

Amazon S3 also has high resilience and availability, and is likely the largest object storage service in the world. S3 is available from any Internet connected machine, however you may need to be careful of where your data is replicated to if your organization is subject to regulatory compliance.

Ceph is ideally suited to single-site Red Hat OpenStack Platform deployments, supporting block, file, and object storage from the same cluster. Ceph offers a large subset of the Swift and S3 API's, making it suitable for migrating existing cloud applications to OpenStack.

 

References

Further information is available in the Object Storage section of the Storage Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/storage_guide/index

Further information is available in the Overview section of the Architecture Guide for Red Hat Ceph Storage at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/architecture_guide/index

Amazon S3

Revision: cl110-16.1-4c76154