skip to Main Content

Different Ways to Store Large Datasets

Chapter 10 of “Society’s Genome” describes the Seven Tenets of Good Archiving. Tenet One is to use well-defined, non-proprietary standards for archiving. While backups are expected to be read with the same tools that created them, long-term archives must be built to facilitate restore far into the future — say for 5, 10, 20, or more years.

Single-serve coffee pods are a perfect choice for weekends at home (or weekdays, or nights, or…) but if you’re packing emergency supplies, you’ll likely choose coffee which does not require a specific machine or even a power source. Likewise, the creation of good archives is about making it easy for the eventual restorer.

LTFS (Linear Tape File System) is a non-proprietary format for tape.  When appropriately used in vendor offerings, it allows tape to appear more like NAS (Network-Attached Storage), and most importantly, enables restoration from the tape to be done independently of the solution that stored the information. 

Anthony Adshead at ComputerWeekly.com talks about a few of the vendors who have made this possible.

“Where Spectra Logic puts object storage in front of LTFS, Nodeum puts the highly scalable Linux journaling file system Ext4 as a means of indexing the content catalogue and metadata for data that resides in the LTFS file system.”

LTFS was developed by IBM, but is now managed by the LTO Consortium, and all the technology specifications are published and freely available – both to current vendors and future restorers.

As noted in the article, these solutions target use cases which involve large datasets that aren’t being used continuously.  Healthcare, genomics, broadcast and video surveillance are all good examples of markets in which this type of data storage and open formatting is in high demand.