feisty upgrade broke my system!

in attempting to see whether having the newest versions of cups & avahi I discovered that my home server was still running dapper and not edgy as I’d thought. So I dist-upgraded to edgy and found the vncserver wouldn’t start automatically. Fixed that by adding the changed fontpath in vnc.conf. Next I thought I’d go for the newest and dist-upgrade to feisty (for some reason update-manager wouldn’t recognise the existence of a new version. So I get through the update and all looks to be running fine, then I reboot to make sure the vncserver still comes up and it all stops working… For some reason I thought it would be a good idea, when reinstalling the machine itself from slack7.1 to xubuntu, to go with reiser as a boot partition. At this point it should be noted that the boot drive is a 9G scsi attached to a 29160 as then I could use the ide channels for software raid5. After rebooting the system I had apparent reiserfs filesystem errors so attempted to repsir them using the given tools, yet after 3 periods of sitting through –tree-rebuild I thought I wasn’t getting anywhere… My next bright idea was to boot with a knoppix disk, image the partion for safety (and tar up the filesystem with the preserve flag), so I could reformat the drive as ext3 and put back the files as they were and hopefully boot that way. This worked to a certain extent, except now I was getting Grub error 17 which is apparently when the partition type ID is not what is expected. So I tried installing ubuntu 7.04 fresh to solve that, no joy (then I copied my tar file back over that install). What I’ve discovered is this: * During the edgy -> feisty upgrade, the raw (standard) device names have been replaced by UUID entries (mapped in blkid.map) and the move to everything as a scsi device what *was* my primary scsi boot device of sda is now not, as hd(a-d) are now also sd* devices. This now (partly) explains why one of my raid5 stripes seems to think that it has a member that is 8Gb and reiserfs as opposed to 300Gb (I’m running lvm2 over raid5 – md0+md1 -> lvm0 which is about 800Gb), in fact finding both my lvm partition and the 300Gb spare partition showing as 7Gb and empty is a bit scary, but that’s for a bit later…
This change in device types (hd to sd) also explains why grub-install (tried to fix the error 17) no longer works, as the device it points to is not the same physical device as defined in the map file! That was Friday, this is tuesday… So Saturday I got it all figured out. All drives now go through a scsi emulation layer, however with a mixture of native scsi and ide devices, things got a little remapped. When I connected (what was) hda, sda stayed as sda, yet when I connected (what were) hdc & hdd, sda became sdd with the ide drives (the raid array) becoming sda-c with the cdrom (previously hdb) becoming sg0. So this explains all the confusion. What I eventually figured out after a bit of head scratching was that booting with an ubuntu cd would let me set up grub to boot from that drive (3 lines :o) ), and mounting sdd1 would let me change the fstab and so get the machine to boot again! So the initial problem may not have been with the actual reiser filesystem, but the fact that the partition that had become sda1 wasn’t reiserfs at all… Next problem was that /dev/md0 & md1 were now remapped to /dev/md/0 & 1 so commenting out the entries in /etc/mdadm/mdadm.conf was neccessary. The raid arrays themselves seemed to be pretty much ok, because of persistent superblocks, although md0 needed rebuilding because of its identity crisis regarding the drive, or more correctly the filesystem, that was previously sda1. An fsck and a few hours for the rebuild later and we were pretty much all good to go! I took this opportunity to reorganise the drive-bays so that ide drive 1, now in its quiet-drive box (very quiet but quite warm!) was in the top bay, then the cdrom (to ease any strain on the ide cable on its way to the motherboard) and then the seagate cheetah with its fan on front. I’m now thinking it might be best to put the hot scsi drive back at the top of the case with more airflow around it as I’ve now twice found the system terminal showing journaling errors and telling me the root filesystem has been remounted read-only. Which isn’t a good sign in a recently rebuilt machine. I also found that the cpu fan was caked with dust and that is acually what is causing all the noise, so I’m thinking that this evening will be including some time with a screwdriver… I can’t really explain the stress on Saturday before knowing that my 800Gb of data was safe and the relief when I found it all safe again. There were about 4 non-important video files with cross-linked clusters after lvm0 (if I hadn’t explained previously I’m running lvm2 on top of raid5 to ease growing the large storage partition) but if that’s the only problem then that’s a small price to pay. As this machine is but a 1GHz P3 I’ve disabled gdm from startup as I only really connect to it via VNC (which works again!) as having 2 X servers seemed a bit of a waste of resources. All in all a successful operation as I now have an up-to-date system with a nice looking minimal (latest xfce) desktop. After the final clean and drive rejigging later today, I should be able to leave this alone until the next drive upgrade (another 750G or even a terabyte drive when they’re more aFordable) or O/S update, as long as tha doesn’t try to UUID my drives again. Or then again when, or if, I have a spare couple of hours I can learn how that works and do it myself… :o)

feisty upgrade broke my system!

Post navigation