In this article I share almost all knowledge that we have acquired here, at Xentime, on storage systems. We have used every type of link in evolutionary chain of storage management:
filesystem -> network filesystem -> object storage.


XFS, Ext2/3/4 are all fast and mature filesystems. But when you add disks, in order to increase space of your LVM volume, you discover that risk of downtime increases as well. See article on scalability where I mention Dirichlet principle.

To minimize that risk, you would have to use a network filesystem with additional servers.

Network Filesystems

That article on network filesystems can give you general understanding of benefits and disadvantages of network filesystems.

The most annoying issues with network filesystems that you will meet are poor scalability and poor fault tolerance. Those issues are even more annoying if you store application assets on network filesystem and paths to those assets in RDBMS. In our case we had 40 physical application servers with data stored on NFS. Once in a while NFS lost some files, so user interface looked broken.

Object storage is the next logical step in evolution of storage systems. It addresses issues of fault tolerancy and scalability, provides application interface and allows you to store metadata as attributes of objects.

Application can talk directly to object storage, avoiding redundant application logic, that would be used in case of RDBMS. Even if you use RDBMS with object storage, primary keys can be stored in object metadata now.

Object Storage

We have tested most of object storage systems.

ZoDB with ZEO is too slow. Also it uses BerkeleyDB, as backend. It would be very difficult to restore BerkeleyDB in case of failure, as there'a no tools for maintenance.


was too unstable last time we checked. It just segfaulted when we stored 9 million of small objects.

CouchDB stores everything in JSON, so binary objects become very large.

Amazon S3 is expensive. Also see article Dangers of the Clouds.

Cassandra has unpredictable performance, as any other overbloated system with such amount of background tasks.

While OpenStack Swift itself has acceptable quality, unlike other parts of OpenStack, it stops serving objects when one of three assumed nodes fail. That is a wrong implementation of eventual consistency from my point of view. But we have used OpenStack Swift for about two years, until the load became too big and maintenance became too difficult.

Riak CS currently is the best on the market. It has predictable memory and performance characteristics, as well as predictable latencies. Applications, written using Erlang open telecom platform are known to work tens of years without human intervention. Riak is implemented using latest studies in distributed systems and is supposed to be used for installations that consist of 3 and more physical servers. Nevertheless it is stable and fast.

LeoFS has a poor implementation, as our studies have shown. It uses Erlang's "ets" for storing metadata, which makes resource comsumption unpredictable.

That knowledge should allow you to avoid mistakes and redundant parts in software design.