With enterprises set to triple the amount of unstructured data they have stored in the next four years, according to Gartner, enterprises are looking for efficient ways to manage and analyze that data. This trend has spiked a massive shift toward distributed file systems and object storage that enable enterprises to scale linearly (scale-out) in a cost-effective manner to address their performance and capacity needs.
While the two technologies are both essential for managing unstructured data, each is a discrete technology with a distinct set of attributes. This post outlines some of the basic differences between object storage and distributed file systems for enterprises currently evaluating next generation data storage and management options.
In my next post, we’ll dive a bit deeper into a comparison of two flavors of distributed file systems – Clustered Distributed Filesystem (DFS) and Federated Distributed Filesystem (Federated DFS).
What is a Distributed File System?
Gartner defines distributed file systems as follows:
“Distributed file system storage uses a single parallel file system to cluster multiple storage nodes together, presenting a single namespace and storage pool to provide high bandwidth for multiple hosts in parallel. Data is distributed over multiple nodes in the cluster to handle availability and data protection in a self-healing manner, and cluster both capacity and throughput in a linear manner.”2
Like distributed file systems, object storage also distributes data over multiple nodes in order to provide self-healing and linear scaling in capacity and throughput.
But this is where the similarities end.
From a technical standpoint, object storage differs from file systems in three main areas:
- In a file system, files are arranged in a hierarchy of folders, while object storage systems are more like a “key value store,” where objects are arranged in flat buckets.
- File systems are designed to allow for random writes anywhere in the file. Object storage systems only allow atomic replacement of entire objects.
- Object Storage systems provide eventual consistency, while distributed file systems can support strong consistency or eventual consistency (depending on the vendor). More about that in part two of this blog post.
Here’s a side-by-side comparison:
|Distributed File System||Object Storage|
|Files in Hierarchical Directories||Objects in Flat Buckets|
|POSIX File Operations||REST API|
|Random writes anywhere in file||Atomically replace full objects|
|Strong or Eventual Consistency||Eventual Consistency|
Putting Theory into Practice
As noted, object storage and distributed file systems are well suited for storing large amounts of unstructured data. Object storage exposes a REST API, and therefore is limited to applications that are specially designed to support this type of storage. In contrast, distributed file systems expose a traditional filesystem API, which means they are suitable for any application, including legacy applications which were designed to work over a hierarchical filesystem.
Distributed file systems offer a richer and more general purpose (but more complex) interface to applications, which enables them to perform specific operations which are not suitable for object storage. Examples of these capabilities include acting as the backend for a database, or handling workloads that are heavy on random reads/writes.
Object storage, on the other hand, is more suitable for acting as a repository or archive of massive volumes of large files and comes at a significantly lower price per gigabyte than a distributed filesystem.
Now that we’ve established the difference between object storage and distributed file systems, I think we’re ready to zoom a bit deeper into the world of distributed file systems. My next blog post will explain why these systems come in two flavors – Clustered DFS and Federated DFS – and which flavor is best suited to meet enterprises’ application needs.