The Anti-Buzz: Defragmenting

by Andrew Emmott on July 27, 2010

in Anti-Buzz,Hardware

The Buzz: You don’t need to worry about defragmenting your hard drive.

The Anti-Buzz: You do need to worry about defragmenting your hard drive.

Why: Even a computer doesn’t like playing 52-card pick up every time you ask it to fetch something.

There are a few out there for whom this might be a sensitive subject. It’s sort of weird that something so basic to computing technology can seem to incite such a religious schism. You will encounter a lot of people who will stress that disk fragmentation is nothing to worry about. It is certainly a less heinous issue than it used to be. It is also oddly situational. I have seen defrags utterly transform a machine’s performance, and I have seen them produce no obvious results.

Here’s the simplest, most honest argument I can make:

Most people use Windows, (Which is the OS most effected by fragmentation), and most people don’t defrag often enough or well enough, which causes gradual performance decreases without the user realizing it.

And of course one of the big picture ideas that pro-defraggers will tell you is that if your computer is performing badly, you will replace it. If it is performing badly simply because you can’t be bothered to ever give it an oil change, then you are wasting your money buying a new machine every three years.

So, exactly what is fragmentation, how do the various OSes deal with it, and what can you do?

The issue of disk fragmentation is very fertile ground for metaphors and analogies. Many of you have probably heard them before. I’m going to err on the side of literal explanation at the expense of my own creativity. Computers and hard drives are not, in fact, magic. Your data is stored as a configuration of tiny little machine components, taking up real physical space on a real physical hard drive. You write documents. These documents change size. You write new documents. You delete documents. All of this, at some point, manifests as machine components moving around and doing work, manipulating a real physical space.

Your hard drive is the slowest part of your computer – even if you own a sleek new solid state drive – and when its performance starts to bottleneck, you can see a genuine decrease in performance.

So let’s speak figuratively. You have a tiny hard drive. It can hold 30 “blocks” of data. You are writing a novel. You sit down and write the first chapter – each chapter takes up two blocks let’s say. So your hard drive stores this novel at the front of your drive, two blocks followed by 28 unused blocks. The next day you write a long letter – another block. This is placed after your novel, (Can you already see the problem?), so now you have three used blocks, followed by 27 unused blocks. Imagine it looks something like this.

Then you work on your novel some more. Three more chapters! Six more blocks. This gets stored after the letter. Now your novel is in separate pieces on your hard drive. Sure when you open it up in Word it feels like one contiguous unit, but that’s basically your OS just hand waving while it fakes continuity between two different hard drive locations.

At this point a defrag could simply mean moving the letter after the novel and joining the novel together – let’s say we do this. Further, let’s say we are anticipating the novel to grow – though the OS can’t reasonably assume how much it is going to grow. Let’s say it tries to keep five open blocks after a file if it can. So the novel takes up eight blocks at the front of your drive, followed by five empty blocks, followed by the one-block letter, followed by 16 free blocks. Each file is still in one piece, easy to get at.

You write another letter. One block. It gets stored five blocks after the first letter. You add a chapter to your novel, now it takes up 10 blocks total, but that’s okay we left it space to grow.

The next day your cousin e-mails you a funny video of his cat falling off the television. It takes up 13 blocks. We don’t have 13 contiguous blocks available. Let’s cram as much as we can into those 10 at the end, and then put the rest as close to that 10 as we can.

Things are still okay. Three of your four files are still in one piece. The next day you print and mail both of those letters. To save space you delete them from your hard drive.

Notice how the cat video is still in pieces. Even worse, actually, is that those pieces are not in logical order – the tail end of the file is actually in front of the beginning. This is the sort of complication your OS is dealing with all the time. I’m going to stop here. The above is a gross oversimplification of what is going on, but it also illustrates the sort of pitfalls that your computer is constantly dealing with.

Most important for the average user, I think, is that what a lot of people don’t realize is that a “file” is an abstract concept that is made concrete by your operating system. A photo or a text file or a video, while it may be a singular object in your mind, and you may interact with it as if it were one thing, is not necessarily a singular collection of information on your hard drive – it is often fragmented into many pieces, and your computer has to do extra legwork to treat the pieces as one whole.

Creating and deleting files, and adding new data into files, has the gradual effect of fragmenting everything on your hard drive further and further. And your computer is creating, deleting, and changing files more often than you think. It goes beyond things that you directly interact with – document, records, photos. Typically, browsing the web causes all manner of locally cached data to be stored, removed and changed, etc – and just because all of that gets conceptually tucked away inside a temp directory or some such, this has little bearing on the literal, physical organization imposed on your hard drive.

In general, fragmentation is even worse in a dental office – imaging software creates, destroys and changes very large files, (And notice that our larger file in the above illustrations was the one that really started to cause problems). Patient records are similarly non-trivial, and even with server storage, each machine is constantly working with local, temporary versions of everything. The “wear and tear” on your data organization is probably less gradual than it is in other industries.

Another thing to keep in mind is that all this meta-data of what files are in how many pieces, of what size, and where, isn’t free. That information takes up physical memory as well as processing time. Again, this is still a gross oversimplification, but the general organizational strategy, and the data that gets stored and used to execute that strategy, are, more or less, what constitutes the “file system” and is something that actually exists independently of your operating system – though any more operating systems are usually designed to only work with a specific subset of file systems – but there is some interoperability, or least ways to emulate interoperability.

So why is Windows more susceptible than Linux and Mac? To a point you can just blame bad design and treat it as an issue that Microsoft hasn’t resolved as well as the competition. A more complete explanation is rooted in history. Windows is based on DOS and olden-days DOS was rigid and less robust than Unix in the name of being simpler on a number of levels. Good, in a lot of ways, for its time, but through an accident of history a complicated, modern and widely used, (and widely networked), system has been placed on top of it. DOS/Windows has been designed for one user doing only one thing at a time. In practice, of course, nobody’s machine only does one thing at a time, (Although multitasking is also an illusion perpetrated by your operating system – a single processor can only be doing one thing at a time, it just does a pretty good job of tricking you otherwise when it gets to do three billion things per second).

The reality isn’t that fragmentation is so much worse on a Windows machine, but that Windows’ design causes it to be impacted more harshly by the effects of fragmentation.

Unix systems, by contrast, don’t do a significantly better job of keeping files together and in proper order, but rather the server mentality is to be prepared to handle multiple requests from multiple users at one time. The simple explanation is that Linux/Unix filesytems are designed to schedule hard drive accesses in a logical order such that excessive and back-and-forth jumps around the drive are kept to a minimum. Fragmentation happens, it just impacts performance significantly less.

As an example, consider the second-to-last illustration from above, reprinted here:

You want to open up the 2nd letter and the cat video. Unix would figure out which chunks of data it needs to pick up, and in what order they occur, then scoop them all up in one linear pass. So it picks up the end of the video, the letter, and then the first part of the video, delivers them where they need to go and reorganizes them later. Windows would go to the first part of the video, pick it up, then back track to the get the end of the video, then move forward again to find the letter. More bluntly, Windows would separate the task into three different accesses to the hard drive – the slowest part of your computer – where other systems would do the prep work needed to keep the number of accesses down to one.

Even all this is still a simplification, but I think it illustrates the problem to people who have until now just sort of taken the issue of fragmentation as just another “magic” technical detail they don’t really understand.

I’m going to plead a little bit of ignorance about how Mac does it. The claim is that Mac does some extra work up front to prevent fragmentation, which has its advantages and disadvantages. Mac has traditionally been a single-user experience, (Certainly Mac’s defrag-as-you-go approach could not work in a multi-user server setting – you don’t reoorganize the bookshelf while other people are still browsing it), but it is also Unix-based now, so the defrag-prevention measure could even be superfluous. But Mac uses its own filesystem too – I really don’t know exactly where it stands other than to say that Macintosh is much less impacted by fragmentation than Windows is, or at least, that’s the common claim.

So, chastising Windows for bad design doesn’t really solve anything. Your reality is that your office is probably full of Windows machines – so you take the time to defrag. Nearly everybody will advise you to ignore the default Windows defragger. My recommendation has been the free and widely-acclaimed MyDefrag for some time – I’ve recommended it on this blog before. You can read more about it on its own site; the key point to make is that it doesn’t just defrag, but it optimally organizes the files in such a way that it improves performance beyond just piecing your files together. You could say that it takes Windows’ disadvantages and turns them into advantages – it plays to Windows’ natural patterns, masking the fragmentation issue as much as it can. My prescription is to run MyDefrag, flagged for “Optimize Monthly” – (Its biggest, deepest, most time-consuming optimization), and to just run it about twice a year, not unlike a standard teeth-cleaning visit.

Hope this has been enlightening, and I will see you next week!

by: at .

Share

Comments on this entry are closed.

Previous post:

Next post: