The hard-disk drive, or HDD, has become the default and most important information storage medium for all modern computing platforms. The huge volume of the market during the last decade has seen considerable amounts of money and effort spent on its development, and progress has been so rapid that the latest drives offer capabilities and price/performance ratios completely unanticipated by systems and software designers in the 1980's and early 1990's. For this reason, the interface between the drives themselves and the systems in which they operate do not derive from a logical specification, but are a patchwork of additions and workarounds certain to confuse those unacquainted with the process of historical development.
In spite of these difficulties, the basic operations required to install, configure, and use HDD's are neither complex nor difficult if the basic concepts are clearly understood. However, since most available documentation is intended for a technically literate audience, the average computer user is faced with a situation in which a great deal of irrelevant information must be read and discarded in order to arrive at a clear understanding of the matter.
This document presents the basic geometrical and technical concepts needed to understand HDD's in the framework of a brief history of HDD development. In this way the reader gains an understanding of the idiosyncracies of the HDD interface, and why the present non-optimal but technically acceptable situation has arisen. In particular, the partitioning of HDD's is explained in some detail, with reference to specific partitioning utilities available on Open Source computing platforms. Any reader with even moderate technical skills should thus acquire the understanding and confidence needed to manage his or her own HDD's by studying this material.
It should also be emphasized that repartitioning a disk drive cannot permanently damage it. The worst that can happen is that the drive becomes unusable until a valid partition table is rewritten to it. The best way to learn and understand drive partitioning is to practise on a spare or empty drive. If you are planning to reinstall an operating system, set aside an hour or so to play about with the partitioning utilities until you feel confident with them. Time thus spent will be well rewarded in learning a simple but important aspect of computer engineering.
Note: The following text contains several examples of commandline entries, that is, commands typed in a shell or xterm at the commandline. The command prompt is a short character sequence printed by the operating system to the console at the start of every commandline, after which the system waits for user entry. Command prompts used by different systems vary, but typically end with a hash sign (#) for the root user and a dollar sign ($) for other users. In this text, the following command prompts are used:
Root user: #>
Other users: $>
For example, the pwd command issued by an ordinary user is shown as:
and the fdisk command issued by root is shown as:
CONSIDERATIONS FOR MS DOS & WINDOWS
Ever since the appearance of competitive operating systems for the ISA PC, Microsoft has continually introduced "bugs" intended to prevent its own products from working in cooperation with those from other vendors. A good example was DR-DOS, a superb OS introduced during the 1980s that many preferred to MS-DOS. MS Windows 3.1 ran perfectly on DR-DOS at first, but after a time it began to misbehave. This was due to the inclusion of code in MS Windows that created subtle bugs if the underlying OS was anything other than MS-DOS, a very clever ruse that ruined the parent company of DR-DOS.
Similarly, MS fdisk has "bugs" in it that prevent it from recreating a hard disk partition table that it does not recognize as being of Microsoft origin. Any attempt to repartition a Linux disk using MS fdisk will fail. This can be overcome by writing zeroes to, say, the first 10Mb of the disk using the dd command:
#> dd if=/dev/zero of=/dev/hda bs=1024 count=10240
This will wipe out the partition table as well as any file system information to give a "brand new" disk.
If you are planning to include MS software on an HDD, the best approach is as follows:
- Install the MS operating system first. If the HDD is brand new or completely blank, proceed with the MS installation. If it has been used before, use the dd command to zero the first few megabytes:
#> dd if=/dev/zero of=/dev/hda bs=1024 count=10240
Then proceed with the MS installation.
- Create an initial MS partition during MS installation. When the installation process prompts for partitioning information, create two partitions:
- The first partition as a primary partition of the desired size. A useful minimum size for modern operating systems (2005) is 4GB - about 2GB for the system and 2GB free for user storage.
- The second partition a a primary partition.
- Complete the MS installation and verify operation.
- Install the FOSS (Free/Open Source Software) operating system.
- Create additional partitions during FOSS installation. See below for a discussion of partitioning schemes. The installation wizard (Disk Druid with RedHat systems) allows for creation, editing, and deleting of partitions.
- Mark the MS partition as bootable during boot loader installation. Modern boot loaders (GRUB and LILO) can recognize and boot all variants of MS Windows.
- Complete the FOSS installation and check that all partitions boot as required.
DRIVE CONSTRUCTION & GEOMETRY
An HDD is a magnetic storage device having one or more circular platters (or disks) mounted on an axial spindle driven by an electric motor (Fig 1). The platters are coated with a magnetic substrate similar to those on magnetic recording tape. Magnetic patterns are written to or read from this substrate by means of record/replay heads mounted on a pivoted arm near the edge of the platters, one head per active surface. A head positioning mechanism moves the heads back and forth across the surface of the platters between discrete locations which define tracks on the disk (Fig. 2). Each track consists of a pattern of minute magnetic areas with alternating polarity. Different patterns are used by different manufacturers, but when read and decoded by the head, what emerges is a series of bits - binary digits, or 1's and 0's. This bitstream is further decoded into sectors, each sector having address and data fields. Each platter can therefore be visualized as containing a fixed number of sectors, each sector containing a fixed amount of data (Fig. 3). Let us assume that a drive has a single platter with eighty tracks on each side, each track being divided into 18 sectors, and each sector containing 512 bytes of information. The total amount of data that can be stored on such a disk is therefore:
|Capacity||= 2 sides x 80 tracks x 18 sectors x 512 bytes|
|= 1,474,560 bytes|
Storage sizes are usually stated in kilobytes or megabytes, but using 1,024 (= 210) bytes per kilobyte, and 1,048,576 (= 220) bytes per megabyte. If we divide the calculated capacity of our drive by 1,024, we get:
|Capacity||= 1,474,560 / 1,024|
|= 1,440 kilobytes|
This will be recognized as the capacity of the common MS-DOS formatted floppy disk, and the geometry just given is the layout used on such floppies. It is possible to format more tracks, and more sectors per track, but manufacturers will not guarantee the reliability of data stored in this manner. The reason has to do with recording density, the number of magnetic flux reversals that can be reliably imprinted on the substrate coating the surface of the disks. Recording densities are stated in bits-per-inch, and the very high storage capacity of modern drives is due largely to the very high recording densities obtained with the substrates coating the disks' surfaces.
Hard disk drives differ from floppy disks in several particulars. They usually have two, three or more platters instead of a single disk, they use higher recording densities, more tracks and more sectors per track, and spin at higher speeds, typically 3,600 rpm. However, the most crucial difference is that, in a hard disk drive, the heads never touch the surface of the disks. Instead, they 'fly' above it on a thin cushion of air generated by the disk's rotation and the shape of the heads. The flying height is about one tenth the diameter of a human hair, so the disks and heads must be protected within sealed enclosures and assembled in clean rooms, but because there is no mechanical friction, no wear occurs, hence the amazing reliability of these devices. During power-up and power-down the heads are moved to parking areas clear of the disk surfaces, and the only mechanical wear occurs in the spinning of the finely-balanced platter assembly, and the sweeping of the head positioning mechanism.
One last point needs to be emphasized. Assume that we have an HDD with six platters, and therefore twelve heads. We could then refer to Platter 1/Head 1/Track 1, Platter 1/Head 1/Track 2, Platter 1/Head 1/Track 3, ... Platter 6/Head 12/Track 1, Platter 6/Head 12/Track 2, ... and so forth. However, it is more convenient to define cylinders. A cylinder consists of all of the tracks which are geometrically aligned - in other words, all of the Track 1's are referred to as Cylinder 1, all of the Track 2's as Cylinder 2, and so on. The reason for this is quite simple - when reading or writing information, the time taken to reposition the heads over different tracks is significant, and it is more efficient to record all of the Track 1's before moving to Track 2 than to move backwards and forwards between tracks. The common floppy disk is therefore spoken of as having 80 cylinders, 2 sides, 18 sectors per track and 512 bytes per sector.
HDD's were in use with mainframe computers from the 1960's, but these early drives were about the size of washing machines, with storage capacities that seem laughable by today's standards. The first HDD's used with personal computers used platters with a diameter of five-and-a-quarter inches and a capacity of ten megabytes (10MB). The positioning of the heads was done by the CPU via an interface called the ST-506 after the model number of the original Shugart drive. Because different manufacturers used different drive geometries, a set of tables was included in the ROM BIOS listing the common ones. As capacities increased, so too did the variation in geometries, and as the 100MB mark was approached, a better interface appeared, named IDE for Intelligent Drive Electronics. This used one of the powerful new microcontrollers within the drive itself to take care of head positioning and other things. To simplify matters, the convention of cylinders, sides and sectors was retained, but many of the larger drives translated this into quite different addressing schemes within the drive itself.
Recall that the limit on storage capacity is determined in large part by recording density. Now it is obvious that the inner tracks, being shorter than the outer tracks, will have a higher recording density. To put it another way, much of the capacity of the outer tracks is wasted if they use the same number of sectors and bytes-per-sector as the inner tracks. Microcontrollers, like microprocessors, perform calculations very quickly, and can easily accomodate a varying number of sectors per track, so the address provided by the CPU in modern drives bears no physical relation to the actual geometry of the drive, in which the outer tracks have many more sectors than the inner ones. Operating systems such as Unix and Linux have long recognized this, and use what is called linear addressing, in which the drive is assumed to consist simply of a long line of sectors. The CPU therefore requests certain sector numbers, and the drive's internal microcontroller translates this into heads, tracks and sectors without the CPU having to bother with such details. Modern interfaces are usually either EIDE (Enhanced IDE), SCSI (Small Computer Systems Interface), or one of the new high-speed interfaces for use with the PCI bus. The operating system does not need to know anything about the internal organization of the HDD's it accesses, using instead a standard protocol which is translated by the drive electronics as required.
There is still one circumstance in which the actual geometry of the HDD is important, and that is during drive partitioning. However, before partitioning can be discussed, it is necessary to know something about drive formatting.
Consider an HDD in the process of retrieving some data. The disk is spinning, the heads move across the surface towards a particular track, slow down, and start reading data from the track. What appears at the output is a meaningless stream of ones and zeroes until a recognizable pattern appears. That recognizable pattern is formed by a low level format, a procedure which places a label at the beginning of each sector, and allows for other information to be inserted in predefined places. The sectors are separated by Inter-Sector Gaps or ISG's, and a sector might typically look something like the following:
|ISG||Head #||Track #||Sector #||Data region||CRC||ISG|
The CRC is an error-checking code that is used to check data integrity, and the other fields are self-explanatory. Low level formatting can be done using a utility program. Under Linux, fdformat is commonly available for floppy disks, and one of fdisk, sfdisk or cfdisk for HDD's. We will come to these in a moment.
The data provided by the low level format are like the street signs in a large city - they tell you where you are in any given place. However, before setting out for a destination, a road-map is necessary to find out how to get there. On a hard disk, this corresponds to the high-level format provided by the filesystem. Consider a typical 20GB HDD. It can be considered as perhaps having 2434 cylinders, 255 heads, 63 sectors per track and 512 bytes per sector. As explained above, these are not "real" numbers - no such drive has 255 heads - but the numbers are translated by the drive controller to whatever is required. Somewhere amongst those 620,670 cylinders and forty million sectors will be the ones containg the data we are wanting. Information about what files are on the disk, and which sectors are allocated to them is the responsibility of the filesystem, and there are many different filesystems. Typically there is an allocation table of some sort at the beginning of the disk, and entries showing which sectors are in use, which are empty, and which are bad and should not be used. The default filesystem in Linux is ext2, standing for Extended 2, although ext3 is now available. MS-DOS nowadays uses either the FAT-32 or NT filesystems, but cannot read any others. Linux can use a large number of filesystems, perhaps more than any other modern operating system, and these are created with the mkfs utility. The very large numbers we have just considered suggest that it might be more convenient to divide the whole disk up into smaller sections, and this is just what partitioning does.
Disk partitions are neither essential nor rigidly defined. Any disk can be used as a single large partition, but this is seldom efficient with disks larger than about 8GB. in order to understand why, it is necessary to have some idea of the size of typical data sets. A full installation of RedHat Linux 7.1 occupies about 2300 megabytes, or 2.3GB, and this is typical of modern operating systems with a GUI installed. If we allow about 2000MB for our own data, a 4GB HDD looks to be a generous size. The smallest Linux installation that can comfortably run a GUI occupies about 800MB. If we can make do with 400MB for our own data, one of the older 1.2GB drives is just usable - anything smaller will be restrictive. So what to do with a modern, inexpensive 40GB drive, ten times larger than generous? The answer is to split the drive into several smaller units called partitions, or logical drives as opposed to the physical drive on which they exist.
Every modern HDD has a small table at the very beginning of its data area listing the number of partitions on it, and how they are organized. By this means it appears to the operating system as a set of independent smaller drives. Each partition is completely independent of the others; they can even - and often do - contain different operating systems. For example, a typical home computer might have the following partitions:
|Partition #1:||Linux boot|
|Partition #2:||Windows 95|
|Partition #3:||Linux root|
|Partition #4:||Linux swap|
|Partition #5:||Large files for both Linux and Windows|
So how big should each partition be? Much depends on how the system is to be used. Assume that you are administering a server for a medium-sized business, a computer with about forty users, only a dozen of whom use it on a regular daily basis. Irregular users login a few times a week for an hour or two, and some backup their laptops onto your machine in spite of being asked not to do so. Your Linux filesystem has several directories, some of which are:
- /bin This contains basic programs for all users - binaries as they are known. The files themselves never change, and it is unlikely that any will be added.
- /sbin This contains programs for system maintenance and the like, most of which are not to be used by anyone except yourself. Again, it is unlikely to change.
- /usr/bin This contains application programs for all users, and will be added to continually over the life of the installation.
- /home Is where users have their home directories. Regular users have a few dozen files stored here, but their requirements are fairly predictable. On the other hand, when one of the irregulars turns up and dumps his laptop backup into it, storage usage suddenly explodes.
- /tmp Is where all programs store their temporary files. At the start of the day it will be empty, by mid-morning it's often well over a gigabyte, and by five o'clock it's back down to a few megs.
Let's now assume that all of these are in a single 20GB partition. One evening, an irregular drops in just before five o'clock, starts backing up his laptop into his home directory, and while waiting for this to finish, slips down the hall to chat to the new guy who's just figured out how to break into the /sbin directory. The two of them start playing with what they've found, something unexpected happens, the /sbin directory gets corrupted, and the system crashes. As system administrator it's your job to sort out the mess and get things back up before tomorrow. If you'd been smart and split the disk into several partitions, with /bin, /sbin and a few others in a separate write-protected partition, you'd only have to reformat that partition, copy across the program files, and you might be off home before six o'clock. But with a single giant partition, not only are you faced with reformatting and reinstalling the whole system, but you'll have lost all work done since the last backup, which was, let's see ...
The above demonstrates why good partitioning practice is essential in a server environment. For a home-user the situation is very different. The same advantages of isolating different data sets are apparent, but on a daily basis the most practical advantage is that a file system check (or CheckDisk for MS Windows users) only covers a single partition (typically about 4GB) instead of the whole disk, and is therefore much faster. Here is the partitioning scheme used by the writer on his 20GB Linux workstation:
|A comprehensive partitioning system for a 20GB HDD|
|/dev/hda5||154||663||4.2GB||Logical||Linux #1 (Slackware)|
|/dev/hda6||664||1173||4.2GB||Logical||Linux #2 (Red Hat)|
|/dev/hda8||1183||1276||4.2GB||Logical||Linux #3 (Mandrake)|
|/dev/hda9||1277||2433||5.1GB||Logical||Bulk storage (Large files e.g music, video)|
A great deal can be learnt by studying this, and some explanation is needed to understand it. The first thing to note is that partitioning is done in cylinders - the numbers in the Start and End columns are cylinder numbers. If you ask for a 100MB partition, for example, the utility will round up to the nearest cylinder, and you might get 107MB, as above, in which each cylinder holds 8,225,280 bytes.
The first partition tables were designed for Intel MS-DOS machines, and only allowed four partitions. When this became restrictive, these four were designated Primary Partitions. Any Primary Partition can nowadays be designated as an Extended Partition. However, there can only be one Extended Partition, and it must be the last of the Primary Partitions. If the first partition is an Extended Partition, there can be no Primary Partitions. In theory the Extended Partition can have any number of secondary or Logical Partitions, but in practice each operating system has finite limits, which for Linux are 15 partitions in total on SCSI drives, and 63 total on IDE drives. Most modern machines will designate the first partition as an Extended partition, followed by as many Logical Partitions as are required.
Although it is possible to install Linux in a single partition, it is best to use at least two, and perhaps three:
- The boot partition. During the boot phase when the machine is powered-up, many early Intel motherboards could not access more than 1,024 cylinders. It used to be advisable to have a small boot partition at the start of the disk, followed by the main partitions, but nowadays this is only necessary on older hardware.
- The swap partition. In order to maximize the amount of memory available for programs, Linux uses Swap Space. This is a section of the HDD where sections of programs can be stored temporarily, and is best implemented as a separate partition. Swap Space can be of any size, but something around twice the size of installed RAM is about optimum.
- The root partition. The main partition, often containing both system and user files.
The scheme shown above separates system and user files. The /home directory is mounted on /dev/hda3, the Working Files partition. All other root directories are mounted on one of the Linux partitions. Not only does this allow multiple installations that can be selected at boot time, but each installation can be upgraded or replaced at any time without disturbing the Working Files, and without the need to copy files from one partition to another. At present (2004) the writer's machine has a Slackware 10 installation on /dev/hda5, a RedHat 9 installation on /dev/hda6, and a Mandrake 10.1 installation on /dev/hda8. Not only is such an arrangement very flexible, it simplifies backups (all user files are in the Working Files partition) and isolates user files from a system crash in another partition.
During boot, the appropriate partitions must mounted correctly. This is done by the /etc/fstab file, shown here for booting into /dev/hda5:
|The /etc/fstab file for the above partitioning example|
The first line of the above refers to / - that is, the root filesystem in /dev/hda5. The second line refers to /dev/hda2, the boot partition. /dev/hda1 is not mounted since it contains a different kernel and boot scheme. To see the contents of /etc/fstab on your own machine, enter:
$> cat /etc/fstab
Information about the other lines in /etc/fstab can be obtained from the man pages by invoking:
$> man fstab
Some examples of typical partitioning systems for home computers might be useful. The first is a simplification of the one given above, and should be suitable for machines with 128MB of RAM and 20 - 40GB HDD's:
|Typical partitioning for a 20 - 40GB Linux installation|
|/dev/hda1||20 - 40GB||Extended||-|
|/dev/hda9||10 - 30GB||Logical||Bulk storage|
A couple of points are worthy of note. Numbering for Logical Partitions always begins from five. The files in the Boot Partition only amount to about 3.5MB in RedHat 7.1, and an 8MB partition is perfectly adequate. However, as of RH 8.0, a minimum of 100MB is recommended for reasons of which the writer is uncertain. If a dual-boot system incorporating Windows is needed for a machine with 256MB RAM, the following would be suitable:
|Typical partitioning for a 20 - 40GB Linux/Windows dual-boot installation|
|/dev/hda3||15.7 - 35.7GB||Extended||-|
|/dev/hda8||7 - 27GB||Logical||Bulk storage|
Finally, if a small HDD is to be pressed into service on a machine with 64MB of RAM:
|Typical partitioning for a 1 - 4GB Linux installation|
|/dev/hda1||1 - 4GB||Extended||-|
|/dev/hda7||864 - 3,864MB||Logical||Linux|
Three partitioning utilities are commonly available on a Linux platform:
- fdisk This is the basic utility for new installations and routine maintenance, and the best one to begin with.
- cfdisk Has a friendlier interface and is somewhat more powerful, but can only be used on a working installation. It is therefore better for maintenance and upgrades.
- sfdisk A commandline utility intended for use by other programs. Not for beginners.
Because incorrect use of any of these can render your system unusable, it is best to begin slowly and safely. If you have a working Linux installation, enter at the command prompt:
#> fdisk -l
That's a lower-case ' l ', not a one. This will simply list the existing partitions on the default drive, and on the writer's machine produces:
Disk /dev/hda: 255 heads, 63 sectors, 2434 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 1 13 104391 83 Linux /dev/hda2 14 26 104422+ 83 Linux /dev/hda3 27 154 1028160 83 Linux /dev/hda4 155 2434 18314100 5 Extended /dev/hda5 155 664 4096543+ 83 Linux /dev/hda6 665 1174 4096543+ 83 Linux /dev/hda7 1175 1183 72261 82 Linux swap /dev/hda8 1184 1277 755023+ 83 Linux /dev/hda9 1278 2434 9293571 83 Linux
#> fdisk -lu
to get a listing of sector rather than cylinder numbers:
Disk /dev/hda: 255 heads, 63 sectors, 2434 cylinders Units = sectors of 1 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 63 208844 104391 83 Linux /dev/hda2 208845 417689 104422+ 83 Linux /dev/hda3 417690 2474009 1028160 83 Linux /dev/hda4 2474010 39102209 18314100 5 Extended /dev/hda5 2474073 10667159 4096543+ 83 Linux /dev/hda6 10667223 18860309 4096543+ 83 Linux /dev/hda7 18860373 19004894 72261 82 Linux swap /dev/hda8 19004958 20515004 755023+ 83 Linux /dev/hda9 20515068 39102209 9293571 83 Linux
Nothing too dangerous here. In fact, if you try:
#> fdisk -?
it will complain of an invalid option and print a help message.
In order to use fdisk to change the partition table, it must be given a device specifier, typically:
#> fdisk /dev/hda
This brings up a message and a prompt, at which 'm' should be entered to get a menu:
Command (m for help): m Command action a toggle a bootable flag b edit bsd disklabel c toggle the dos compatibility flag d delete a partition l list known partition types m print this menu n add a new partition o create a new empty DOS partition table p print the partition table q quit without saving changes s create a new empty Sun disklabel t change a partition's system id u change display/entry units v verify the partition table w write table to disk and exit x extra functionality (experts only) Command (m for help):
The first thing to note is the 'q' option - quit without writing a new table. Always use this to exit the program unless you are certain that you want to change the partition table. Other options to explore are 'p' to list the existing partition table, 'l' to list the available partition types,and 'x' for a quick look at more detailed functions (must be entered twice in some versions).
PARTITIONING A DRIVE
In order to repartition a drive, you must have either:
- An installation CD for a Linux distro or other operating system.
- A boot disk and a filesystem root disk.
- A boot/root disk.
A boot disk on its own does not contain a filesystem, but expects to find one somewhere else on the machine. There are several boot/root files available for Linux which contain both a boot system and a root filesystem, all compressed into an image that fits on a single floppy disk. One of the best is Tomsrtbt, available from www.toms.net/rb If you have any intention of doing real partitioning, get a copy of this, make two root/boot floppies, and boot from each of them a couple of times to ensure that they work, and to familiarize yourself with their operation. Being minimal systems they are not particularly friendly, but they can certainly be your best friend in a real emergency.
Tomsrtbt contains both fdisk and mkfs, the first to set up the partition table and do the low-level format, the second to install the filesystem proper. If you know how to set up a hard disk from scratch, this simple tool can be used to build a complete Linux installation from the ground up.
However, most Linux users will only do partitioning during installations or upgrades, and the majority of distros offer either fdisk or a custom utility for partitioning. RedHat offers Disk Druid, a simple but effective tool for standard partitioning of all HDD's.
Before beginning an installation or upgrade of any operating system, sit down with pencil and paper and write out the partitioning system you want to use.
If you have not done repartitioning before, but want to experiment with multiple partitions, it is best NOT to use Disk Druid to create them. Instead, create only the three basic partitions and a single large partition for the remainder of the drive. In RedHat these will be:
- boot must be mounted at /boot
- root must be mounted at / - that is, a single backslash meaning the root directory.
- swap does not need a mount point.
- store should be mounted at /store in the root directory.
Note that the 'store' partition is optional, and can be given any name you prefer. It is possible to leave a portion of the disk unformatted, but some partitioning utilities will complain at this. Specify the size of the partitions according to your partitioning scheme. Once installation is complete, the large partition can be split into smaller ones using fdisk or cfdisk without moving partition boundaries.
Once you've completed the install, invoke fdisk, delete the 'store' partition, and create as many new partitions as required in the free space thus created, but without modifying the existing partitions. This is the secret to using fdsisk safely - not to change the boundaries of partitions you want to keep, either deliberately or inadverdently.
Note that there are utilities available that can move partition boundaries without losing data, but their results cannot always be guaranteed. It is much better to use a partitioning scheme that can meet your requirements simply by splitting or merging partitions as explained above.
If the above instructions are understood and followed, the reader will be able to make optimum use of the large and inexpensive HDD's now available.