Last week I started the final round of Debian upgrades for the servers I maintain here and there, which is mostly complete today. I haven't been so lucky with upgrades this time, for a long list of different reasons. In the end, the smoothest upgrades were those boxes I upgraded when etch froze or so.
natura.oskuro.net is the box serving these pages. It's an old, extremely noisy Pentium 150 which I've been intending to replace for a while now. I started the upgrade early on Thursday, knowing it'd take a while (natura takes its time only to read the Dpkg database), and it had apparently finished when I was ready to leave the office.
Three issues:
- apache2's enabled modules were forgotten and I had to reenable them again, plus IIRC the default "It works!" site got reenabled, hiding the blog for a little while.
- bind didn't like some change in one of the default configs and I failed to noticed it had not started. After the reboot, the box was unreachable remotely, but that was OK as I had to go visit my dad that night for his birthday, so it was fixed after one hour or so.
- Last, my blog stopped working, and while I thought it'd have something to do with apache2.2, I had no time to have a close look as I had to leave to Lleida this weekend. Yesterday I discovered that for some reason, the old tar.gz PyBlosxom install wasn't liking the new python2.4, so I decided it was about time to move to PyBlosxom 1.3, from the Debian package. Not without the mandatory "It works!" post published in the gazillion Planets, natura was ready to go.
The very same night of last Thursday, I decided to dist-upgrade the box
which serves the
Spanish Debian website mirror. That's
the only purpose on the box, so you can imagine the upgrade should have been
pretty straight-forward. And so it seemed, until, in the middle of unpacking,
dpkg died with a horrible I/O error
, and I dropped into an
unusable remote terminal with no working commands. Fortunately, apache2 was
still up and running, and the web service has been working without
interruption since the hard drive crash, albeit with no syncs from
www-master.
Today, Sergio visited
the campus and had a look. It was a XFS crash, which got cleanly repaired
using an install CD. We have an empty partition in the box, and will probably
move the system to it temporarily, and back to the RAID, but on ext3. When
the box was back online, I just had to resume the upgrade process, make mdadm
happy and update lilo.conf
before rebooting into the new
kernel.
This box uses LILO for some obscure reason I can't remember too clearly anymore. The box has just one partition on a md array, on two SCSI disks on a aic7xxx-based controller. Can anyone hint me why GRUB would have failed on us back in sarge, and if any fixes in the etch version would work any better? Using LILO here is error prone, and basically feels like a step back. Anyway, www.es.debian.org is now back up and running with updated content.
Sindominio.net had its bi-annual upgrading party last monday, but unfortunately I wasn't able to help much as when I tried to log into the server, I must have caught the system in the middle of some key lib upgrade or something, and again I was locked in a unusable shell which would only segfault. Given my previous experience, I assumed that something had gone wrong and the box would need to be fixed at the console, and after 20 minutes I gave up helping on that front. Until I noticed, quite a long while later that I was still getting mail from the server. I managed to log in to discover the upgrade was done, with just a few bits remaining to be done. The major issues were encountered with our pam and ldap setup, plus nscd kept dying causing quite a lot of mayhem all over the place. Great work from Seajob, Syvic, nogates, apardo and the rest of the people who handled it! With etch, we can finally move back to an official Debian kernel, something we've been longing to do for a long time. The only pending upgrade issue is that we need to move from our old jabber server to either the traditional jabberd 1.x or ejabberd; our current implementation is no longer supported in Debian.
The last of the etch upgrades stories involves Sofcatalà's servers. The box was running on a CentOS 4.4, which was moved away into a subdir just after booting Debian-Installer, and then lobotomised so it would run as a Linux-VServer under a new Debian etch install. I'll probably write more details about it soon though, as it could be a maybe less scary alternative to Guillem's debtakeover.
Yay for etch!