Don't assume that your data is lost, just because of horrendous noises from a hard drive, failure to boot, and diagnostic tool failure reports. With a combination of a live CD and some specialist tools, all may not be lost.
Do consider using a professional data recovery expert for recovery of important data. This article is for Debian / Ubuntu; you need to make appropriate changes for other distributions and package managers.
Step 1: Boot in a live CD
There are many live CDs to choose from, and some dedicated to this purpose, but to make the process as simple as possible for me, I used Kubuntu as I know it well. The following instructions assume that you are using an Ubuntu live CD.
- Boot onto a live CD
- Configure networking
- Uncomment repositories in /etc/apt/sources.list - see AptRepositories for guidance
- Run aptitude update (don't bother upgrading though!)
Step 2: Mount a destination drive
Make sure that your destination drive is larger than the source drive, a 500gb drive may not be large enough to install a 500gb image of that drive.
Please also note that USB is notoriously slow, in a recent example it took 13 days to copy 500gb. Try and use a SATA or e-SATA connection, or perhaps USB3 if you have such a thing.
You need access to a hard drive that is able to support large files, if you use an external USB drive, then make sure that it is not FAT (usually the default). You can also use a mounted NFS share. Mount it ready for action; I will assume that the destination drive has been mounted at /mnt/destination. Ensure that the source (broken) drive is not mounted (it shouldn't be unless you mounted it).
Step 3: Determine source drive id
You need to find out the id of the source drive. This will be listed under /dev and if it's your primary drive will probably be /dev/sda or /dev/hda.
Step 4: Install GNU ddrescue
Note for historical* reasons, the package is named gddrescue in Debian and Ubuntu.
$ sudo aptitude install gddrescue
$ man ddrescue
* Debian (and Ubuntu) package names are screwed up: ddrescue has a package name gddrescue, whilst ddrescue does exist in the debian/ubuntu repos, but is actually a package for dd_rescue, which is an older and less effective program to do the same thing. Plenty of potential for disaster there.
Step 5: Run GNU ddrescue
N.B. ddrescue is very slow, I believe the speed should be dramatically better by using "-b 4k" or "--block-size=4k" on all the ddrescue commands below. I have not yet tested this theory. The benefit may be a 10th of the time taken, but on your head be it. I intend using this next time, and will update this page thereafter.
Replace "/dev/sda" for actual source drive, and "/mnt/destination" for actual destination drive.
$ sudo ddrescue -n /dev/sda /mnt/destination/recovered.img /mnt/destination/recovered.log
The "-n" should run faster as it will skip over the errors (although it seemed no better to me). Data recovery is not a fast process, and it will probably take a few days (see N.B. at the beginning of this section). The great thing about ddrescue is that you can abort at any time and recommence from where you left off. You can also skip forward by adding the switch "-i" followed by the number bytes into the disk, e.g. to start from 10gb:
$ sudo ddrescue -n -i 10000000000 /dev/sda /mnt/destination/recovered.img /mnt/destination/recovered.log
My tip is to keep aborting (Ctrl+C) and skip forward until you pass the area of the disk which is causing problems. Then, once the bulk of the drive has been recovered you can go back to the sections you skipped, or just move onto the second pass (see next section). ddrescue will not replace data already recovered, so you can do this safely.
Step 6: Run GNU ddrescue again
You should by now have a full image, albeit with some blank (or zeroed) areas. You may decide that you've spent long enough, and skip to the next section. Now you should run again, this time replacing the -n with "-r 1" or perhaps "-r 3" to try more than once to recover the data.
$ sudo ddrescue -r 1 /dev/sda /mnt/destination/recovered.img /mnt/destination/recovered.log
Step 7: Copy the destination image
You don't want to mess up your hard earned image - so copy it and work on the copy.
Step 8: Install sleuthkit
You need mmls to determine the partition structure of your disk image. This is part of sleuthkit. On the destination PC, install sleuthkit:
$ sudo aptitude install sleuthkit
$ man mmls
Step 9: Run mmls
$ sudo mmls copy.img
DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors
Slot Start End Length Description
00: ----- 0000000000 0000000000 0000000001 Primary Table (#0)
01: ----- 0000000001 0000000062 0000000062 Unallocated
02: 00:00 0000000063 0117195119 0117195057 NTFS (0x07)
03: ----- 0117195120 0117210239 0000015120 Unallocated
Take a note of the Start point of the partition that you wish to access.
Step 10: Calculate Offset
This shows several partitions. In this example, we want to mount the NTFS partition starting at block 63. To calculate the number of bytes, multiply by 512:
63 x 512 = 32256
Step 11: Attempt to mount partition
For a DOS partition:
$ sudo mount -o loop,offset=16384 copy.img mountpoint
For an NTFS partition:
$ sudo aptitude install ntfs-3g
$ sudo mount -t ntfs-3g -o ro,force,loop,offset=32256 copy.img mountpoint
For some reason, the image won't mount with -t ntfs, and does need the full ntfs-3g functionality, even though we are only mounting read-only; I don't profess to understand the reasons for this, but ntfs-3g just works.
Step 12: Extracting files from an unmounted disk image
If the image will not mount, then the general advice seems to be to copy the image to clean hardware (i.e. a physical disk) and use a Windows recovery disk to boot. Failing that, all is not lost, there are a number of tools that will search disk images for files. I played with photorec, but whilst it recovered loads of cached images from IE, it failed to recover more than a handful of proper photos. Foremost on the other hand seemed to be much more successful.
Update: A number of people have reported successes with photorec, so I suspect that it was simply a buggy version of photorec.
$ foremost -i copy.img -o output-folder
With luck this will give you a folder that looks like this:
drwxr-xr-x 30 root root 4096 2009-01-08 18:04 .
drwxrwxrwx 5 root root 4096 2009-01-08 18:03 ..
-rw-r--r-- 1 root root 888832 2009-01-08 18:15 audit.txt
drwxr-xr-- 2 root root 12288 2009-01-08 18:15 avi
drwxr-xr-- 2 root root 12288 2009-01-08 18:15 bmp
drwxr-xr-- 2 root root 69632 2009-01-08 18:15 dll
drwxr-xr-- 2 root root 4096 2009-01-08 18:10 doc
drwxr-xr-- 2 root root 20480 2009-01-08 18:15 exe
drwxr-xr-- 2 root root 139264 2009-01-08 18:15 gif
drwxr-xr-- 2 root root 20480 2009-01-08 18:15 htm
drwxr-xr-- 2 root root 4096 2009-01-08 18:13 jar
drwxr-xr-- 2 root root 135168 2009-01-08 18:15 jpg
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 mbd
drwxr-xr-- 2 root root 4096 2009-01-08 18:15 mov
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 mpg
drwxr-xr-- 2 root root 4096 2009-01-08 18:14 ole
drwxr-xr-- 2 root root 4096 2009-01-08 18:14 pdf
drwxr-xr-- 2 root root 57344 2009-01-08 18:15 png
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 ppt
drwxr-xr-- 2 root root 4096 2009-01-08 18:14 rar
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 rif
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 sdw
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 sx
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 sxc
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 sxi
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 sxw
drwxr-xr-- 2 root root 4096 2009-01-08 18:04 vis
drwxr-xr-- 2 root root 12288 2009-01-08 18:15 wav
drwxr-xr-- 2 root root 4096 2009-01-08 18:15 wmv
drwxr-xr-- 2 root root 4096 2009-01-08 18:13 xls
drwxr-xr-- 2 root root 4096 2009-01-08 18:14 zip