Friday, 11 March 2011

File Systems Part One Home Advantage

"What's a file-system and what's an Ext3? Sit down and let your Uncle Robin explain..."

If you don't know your Ext3 from your Reiser, Robin Catling (of podcast fame) has written two informative articles on file systems, the first of them is used this month with part two coming next month.
Article first appears in Full Circle Magazine Issue #46.

Back in the day, the first electronic computers were only operated by large corporations and government departments. Programs and data could only be loaded straight into memory, because that's all the storage there was. Then engineers got smarter, using stacks of punch-cards and paper tape for programs and output, followed by quarter-inch magnetic tape: all serial-access storage.

Then some bright spark devised a magnetic disk drive, a Direct Access Storage Device (DASD) which could read and write to random locations. Which is why they needed a file-system to organize the data and support a Disk Operating System (DOS). Move on a few years; enter the personal computer. When IBM needed a file-system and some way to access it, we got the Microsoft Disk Operating System, MS-DOS. Not the first or only DOS, it was the main player for home PC's - like it or loathe it, DOS was the one you used.

Move on a few more years and find the smart computing set (that's us) using Open Source software. Accept the default settings in most installers and get the default file-system. Do anything else and your first issue with Linux is choice. Which one do you choose?

Dear Diary...

Or more precisely, “Dear Journal...” Most modern file-systems employ journaling. Think of it as a low-level activity log. A file to be updated is first written to the journal, clocked-in, written to disk when ready, cleared from the journal and clocked-out. If there's any interruption to the normal running of the computer – power cut, catastrophic crash – whilst the file is being written to disk, the file system has the journal entries for all operations not yet completed. If all is good, the operation can complete, if not, there is a log to aid file recovery.

Journaling pays a small disk/processor overhead for added data security. Some file systems reduce the overhead by not writing the full file to the journal, so you will see references to file meta-data, inode, or disk location in their journaling scheme.

Other crucial features of a decent file-system include consistent access controls (or permissions, or authorities, according to your school), aliasing and symbolic links – multiple pointers referencing a single copy of a file.

Ext Family

We're on the fourth iteration of Ext, or Extended file-system. The original Ext is practically defunct, so Ext2 is the lowest version you'll see in general use. It's a non-journaling file-system, so it's fast, but not so secure as it's successors. Since it writes less to disk (and erases less), it remains a good choice for flash memory, USB-sticks and SD-cards which have limited life-time write-limits.

Ext3 and Ext4 remain backward compatible with Ext2, with the addition of journaling. They have years of optimizations to improve performance and data security, which is why Ext3 took off with large databases, but not with servers and why Ext4 finally scored as a good all-rounder. Ext4 has many major improvements over Ext3 like larger file-system support, faster checking, nanosecond timestamps and journal verification using checksums. It employs a technique known as delayed allocation to reduce file fragmentation and means it can be used on flash memory and solid-state disks (SSD); however, delayed allocation has potential for data loss. I use Ext4 on all my desktop, laptop and external hard-drives with a noticeable increase in performance over Ext2 and Ext3. Ext4 is robust and efficient, but lacks some advanced features, such as support for disk snapshots and advanced scalability. Enter the next two contenders...

Rise and Fall

ReiserFS represented a radical leap forward in 2001, including many features that Ext still cannot implement. Reiser4 improved or completed many features over the initial release. Development since 2004 has been is very slow and remains under a cloud after the personal 'legal difficulties' of original developer Hans Reiser. Reiser4 is not supported in the main Linux kernel.

That said, ReiserFS performs well on systems requiring many small writes, say logs and indexes, such as databases and email servers.

Better and Better

BTR-FS - can we agree to pronounce it 'better' and not 'butter'? I don't like butter and I prefer not to keep my files on something slippery that's likely to melt. Thank you. It stands for B-tree File-system, originally developed by Oracle (watch those licensing terms, Open Source fans!). Having similar features to ReiserFS, it trades heavily on enterprise-level features such as drive pooling, on-the-fly snapshots, transparent compression and on-line defragmentation. All the major Linux distro's plan to adopt it as the default file system eventually; however, you can't currently use it on a boot partition, only data partitions, so it isn't ready for exclusive use yet. Current performance benchmarks show it slightly slower than Ext4 in many uses, so bank on big-database vendor Oracle fixing that in the next couple of versions. The documentation makes explicitly clear, it is “not suitable for any uses other than benchmarking and review.”

Best of the Rest

If you can't get enough of file-system acronyms, there's a gaggle of niche flavors which still appear in Linux installers and disk tools:

XFS, from Silicon Graphics: much like Ext, good for large files, but not small; so render-farms and video processing good, databases and email not so good. If you need guaranteed data throughput rates, on-line resizing, built-in quota enforcement and support for file-systems up to 8 exabytes in size, you can find XFS as an install option on many popular Linux distributions. You can tune your system to use variable block sizes, like a sliding scale for efficient use of space or high read-performance.

JFS, from IBM: showing it's age now, but a good performer in it's day on small drives and files. Find it on older hardware.

ZFS, from Sun Microsystems: think of it as the grand-daddy of BtrFS.


Swap isn’t itself a file system. This is virtual memory without a file system structure, used only by the kernel to write memory pages to disk. It's your swap-file or paging-file for when you run out of physical memory or when you set your computer to hibernate. You won't get through a Linux install without a Swap partition.

No Country for Old File-systems

That's the round-up, in no technical depth whatsoever, of the common file-systems on our 'home-turf'. If you have the stamina, part two will take you over the border to foreign lands where the file-systems wear funny logos and speak in strange tongues. But if you want to be the Ambassador of Open, or the Emissary of Interoperability, you'll need to recognize the other tribes... RC

No comments:

Post a Comment

At least try to be nice, it won't kill you...