A Tale Of An Unnecessary Linux Reinstall

After moving it to its new home, the web server at my office decided not to boot up. The server runs on CentOS4 and is configured to perform updates daily. It choked during init and the last messages shown before the system froze are as follows:

VFS: Cannot open root device "<NULL>" or unknown-block(8,22)
Kernel panic - not synching: VFS: Unable to mount root fs on unknown-block(8,22)

I immediately suspected that it has something to do with the last kernel that got downloaded prior to the failed boot. I performed almost a dozen reboots choosing every kernel available in the GRUB kernel selection screen but to no avail. The server choked on the exact same line shown above.

I was pretty sure that a reinstall was probably the way to go, mind you I’m never a fan of performing OS reinstalls to fix kinks in the system. I really thought that there were no other ways around this.

I grabbed the PLD RescueCD that I always keep near our servers and booted it up on the ailing web server. It booted up fine and proceed to mount the LVM volume that holds our web data:

modprobe dm-mod
vgchange -ay
mkdir -p /mnt/data
mount /dev/VolGroup00/LogVol00 /mnt/data

Once the relevant data was compiled and tarballed, I scp‘d them over to another server for safekeeping.

I then rebooted the web server, but this time placing the CentOS CD in the drive. Lo and behold, even booting up from the CD choked with the exact error message shown earlier! I started thinking, is this a CentOS thing or is the hard disk drive really screwed?

I swapped the CentOS installation CD with PLD and rebooted again. I decided to remove all partitions in the hard disk and perform the installation on a thoroughly cleaned drive. After rebooting (again), the CentOS CD still chokes on the same error message!

All off a sudden, I felt the urge to perform a RAM diagnostic test on the server. Thankfully, PLD RescueCD has Memtest86+ compiled on it. Within seconds the real problem became apparent; the server had a bad memory module! If only I had performed the test before wiping out my hard disk, it would have surely saved me tons of time.

Hopefully somebody would learn a few important lessons from my experience and not go through the same experience I did.

