Sat, 28 Feb 2009

natura upgraded to lenny

natura.oskuro.net, the home server which still serves this blog, has been suffering hardware problems for some weeks. Apparently the hard drive is failing intermittently, so every now the kernel starts spewing out noisy errors about its main disk dying. If I notice this quickly, it can be rebooted and that normally fixes it for a few more days. But if I don't, it'll end up giving nasty bus errors which will make remote logins a challenge. Most processes still work, but the filesystem appears to be gone. It's easy to know what's going on if you visit the blog's url and get some 404, and in that case I can only phone my father and tell him to press the reset button (I've tried sysrqd, but I need to open the port in the router and haven't had chance to do that yet).

So it was time to do something about it, and the other day I installed a dirty 40GB drive on the second IDE controller, in case I could find the time to do somethng about it. Being with an endless pharyngitis that doesn't seem to get cured entirely, I've had some time today to look at it. This evening, I was about to transfer all the system to the new disk (it's half the size as the broken one, and probably slower, but it hopefully has no bad sectors), but I decided to upgrade the system first.

natura was first installed in late 1997 or at the beginning of 1998, using the Debian bo install media on a Pentium 150MHz, and has gone through seven dist-upgrades which, as far as I can remember, have always worked out without major problems.

The upgrade to lenny hasn't been an exception. The server has gradually lost many of the services it once hosted, so there aren't too many services to take care of anymore. All the mail services I setup for my father ended up being deprecated as they started to get used to Hotmail, GMail and so on, and the frequent hardware crashes made me switch them to the Linksys based DHCP server. In the end, the problems I saw after the upgrade were very similar to what I faced when I upgraded to etch:

Such an ancient install will clearly have old, obsolete packages. I installed apt-show-versions to find out what didn't match my package sources. I found I had every single version of cpp, gcc and g++ from 2.95 to 4.3, and a myriad of obsolete libs. But there were also real gems:

defrag 0.73pjm1-7 installed: No available version in archive
figlet 2.2.1-3 installed: No available version in archive
ipmasqadm 0.4.2-2 installed: No available version in archive
isapnptools 1.26-5 installed: No available version in archive
ms-sys 2.1.0-1 installed: No available version in archive
queso 0.980922b-3 installed: No available version in archive
update 2.11-4 installed: No available version in archive

Spaniards will remember “queso” because it was written by Jordi Murgó and became a classic tool to find out what OS was running on a remote host. “update” was apparently needed to flush your filesystems prior to Linux 2.2.8, and “defrag” is obvious, although leaves me wondering why it was needed at the time.

With the upgrade done successfully, next step is trying to get the system transfered to the spare hard drive. For this, I first partitioned it creating a primary partition using up more or less half of the available space, and setup a LVM volume, leaving some free PE's in the volume group just in case I want to do snapshots in the future, and formatted it using ext3. I then transfered the system to the new disk and now face the boot challenge.

I haven't created a boot partition and that should be a double problem: the BIOS is buggy and will only boot from the first 1024 cylinders, and my root is on LVM and GRUB legacy might not like it (but I'm not sure). However, I've become a big fan of GRUB2, and know I will be able to boot no matter what my BIOS thinks of my disks and regardless the complex root partition setup I throw at it. The plan is to install GRUB onto the new drive's MBR, and set it up using the ata module, which should allow to ignore what the BIOS says, and read beyond cylinder 1024 or even boot from CD-ROM. However, this isn't a setup I haven't tried before, and a single failure will result in me taking a train to fix it on-site.

So, GRUB experts out there, any suggestions? Of course, for now I guess I can install GRUB in the current drive's MBR and make it boot the old kernel using the new system as root, but that's dirty and would just postpone the problem.

Fri, 13 Jun 2008

Upgrade to PyBlosxom 1.4.3

This week I spent some time upgrading PyBlosxom to version 1.4.3. I was still using 1.2, which probably was insecure and buggy. This is the first step in a bigger plan to replace Apache2 with nginx in this server, but that will come later.

I was lucky to find PyBlosxom's author, Will, on IRC at the right time, who kindly answered a few questions and helped solve a few issues with the comments plugin and flavours. So, after a while, I had fixed a few subtle, 4 year old bugs in my XHTML templates and more notably, fixed lots of small bits in the rss feed, which finally makes Liferea and Advogato like my entries.

But, the biggest achievement was getting a brand new comments.py plugin from Will, which allows to close comments on entries after an expiration date. So, even if I was happily using Mako's Akismet plugin, I still was getting 5 or 6 spams each day on very old entries (favourites being one about Alonso visiting València and one remembering the 70th anniversary of the Spanish Civil War). Well, not any longer.

My dear spammers, you can now go pester someone else, or pick new entries pretty quickly before they get closed down. It's been a nice fight, but it's a good time to wish you go away and fuck off. With love, Jordi.

Thank you, Will!

Wed, 25 Apr 2007

natura upgraded to etch

Last week I started the final round of Debian upgrades for the servers I maintain here and there, which is mostly complete today. I haven't been so lucky with upgrades this time, for a long list of different reasons. In the end, the smoothest upgrades were those boxes I upgraded when etch froze or so.

natura.oskuro.net is the box serving these pages. It's an old, extremely noisy Pentium 150 which I've been intending to replace for a while now. I started the upgrade early on Thursday, knowing it'd take a while (natura takes its time only to read the Dpkg database), and it had apparently finished when I was ready to leave the office.

Three issues:

The very same night of last Thursday, I decided to dist-upgrade the box which serves the Spanish Debian website mirror. That's the only purpose on the box, so you can imagine the upgrade should have been pretty straight-forward. And so it seemed, until, in the middle of unpacking, dpkg died with a horrible I/O error, and I dropped into an unusable remote terminal with no working commands. Fortunately, apache2 was still up and running, and the web service has been working without interruption since the hard drive crash, albeit with no syncs from www-master.

Today, Sergio visited the campus and had a look. It was a XFS crash, which got cleanly repaired using an install CD. We have an empty partition in the box, and will probably move the system to it temporarily, and back to the RAID, but on ext3. When the box was back online, I just had to resume the upgrade process, make mdadm happy and update lilo.conf before rebooting into the new kernel.

This box uses LILO for some obscure reason I can't remember too clearly anymore. The box has just one partition on a md array, on two SCSI disks on a aic7xxx-based controller. Can anyone hint me why GRUB would have failed on us back in sarge, and if any fixes in the etch version would work any better? Using LILO here is error prone, and basically feels like a step back. Anyway, www.es.debian.org is now back up and running with updated content.

Sindominio.net had its bi-annual upgrading party last monday, but unfortunately I wasn't able to help much as when I tried to log into the server, I must have caught the system in the middle of some key lib upgrade or something, and again I was locked in a unusable shell which would only segfault. Given my previous experience, I assumed that something had gone wrong and the box would need to be fixed at the console, and after 20 minutes I gave up helping on that front. Until I noticed, quite a long while later that I was still getting mail from the server. I managed to log in to discover the upgrade was done, with just a few bits remaining to be done. The major issues were encountered with our pam and ldap setup, plus nscd kept dying causing quite a lot of mayhem all over the place. Great work from Seajob, Syvic, nogates, apardo and the rest of the people who handled it! With etch, we can finally move back to an official Debian kernel, something we've been longing to do for a long time. The only pending upgrade issue is that we need to move from our old jabber server to either the traditional jabberd 1.x or ejabberd; our current implementation is no longer supported in Debian.

The last of the etch upgrades stories involves Sofcatalà's servers. The box was running on a CentOS 4.4, which was moved away into a subdir just after booting Debian-Installer, and then lobotomised so it would run as a Linux-VServer under a new Debian etch install. I'll probably write more details about it soon though, as it could be a maybe less scary alternative to Guillem's debtakeover.

Yay for etch!

Thu, 19 Oct 2006

Silent home servers

The computer which hosts this blog is a venerable Pentium 150Mhz, with 64Mb of physical memory and two decently sized disks. It has been running non-stop mostly without hiccups for several years, and I'm quite happy with it, even if the processing power is so scarce I've been having to tune down some services as Debian has gotten more resource hungry, dist-upgrade after dist-upgrade.

Natura is my 2nd oldest Debian install, coming back from Ham, and after a while it became a home server when it was replaced by an Athlon 700Mhz at my father's house. The only hardware incidents are all related to blackouts or storms: two dead disks and one power supply. The CPU died years ago, but I discovered that many months later. I guess it wasn't so necessary. :)

It is time to replace natura, though. The components are aging and they have become quite noisy, despite my attempts to cleanup the dust. Lately it is so loud that I can't understand how my dad can actually get work done with that persistent noise in the room. Besides, it'd be good to get just a little bit more of CPU power to do a few things that have been postponed for a while now. I have been looking for offerings in the embedded devices market.

I am looking for a device with the following characteristics:

I've found that the Thecus YES Box N2100 is one of the most interesting offerings: 2 Gigabit ethernet ports, two internal SATA HD bays, 3 USB ports... but is a bit too expensive: 350€ (without disks). tbm also told me to look at some cheaper PowerPC devices, but I forgot the name right now.

So, dear Lazyweb, what would you recommend as a natura replacement for a home server?

Sun, 20 Nov 2005

Hard drive failure

On Thursday, I went up to my father's house to pick up my desktop, now that I finally have Internet access at home so I can have permanent access to it from a more convenient place. While I was there, I decided it'd be a good idea to do an upgrade on my dad's box.

While I was doing this, the nfs share of the Debian mirror in the home server that serves this blog stopped working, and I couldn't ssh in, although the console and ICMP appeared to work. Having no monitor attached to that box, I decided to reboot it and resolve the problem in a lame way. Had I known my primary hard drive had decided enough is enough, I would have looked for a monitor instead.

After reboot, the server started doing a very loud and scary noise, which was not new to me. When the BIOS tried to probe the disk, it would not start up and would instead whine like that. When this happened in the past, switching the box for a few minutes and trying again was enough. Not this time, though, it looked like it was the end.

Horrified by the fact that I had no really useful backup of /etc and /home, I tried over and over to get it working. I plugged the HD into another old AT box (which once served as my GNU Hurd playground), but I still got those scary noises. At least I knew it had nothing to do with the mother board.

As it was getting late, I decided to take the HD with me, and another Maxtor of the same size (but not exact model) with me, to have a look in office the next day. I thought I'd have to restort to try to transplant the logic board from the good one to the faulty disk, but before that, I realised I could try mounting it in a USB cage. For some reason or another, this worked, and I quickly saved thetwo partitions it contained with no read errors.

After this, I realised I had not done the copy as root, so I had lost all my ownerships. I plugged the disk again, and re-rsynced. It still worked. Alleviated, I went back home, copied the data to another old disk, and I don't remember why, I tried mounting the faulty disk again. I got scary noises even using the USB case, and no other tries have been successful, so I think it's completely dead now, just before my final mount which saved the data.

The box is now back online, using a Seagate disk I had stored in a drawer and with no notes on it about it having any kind of problem. I suspect I will need to do real backups now, because the drive isn't as safe as it looked...

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }

hda: drive not ready for command
ide0: reset: success

My little P150 needs a 6GB drive. I'll have to find one somewhere.

Thu, 10 Nov 2005

Better not comment

Lately, there's a clear trend in my webserver stats. Since mid September, the top three search strings for oskuro.net are:

  1. Naked men (with like 60% of occurrences)
  2. Fernando Alonso
  3. Skinny dipping

I'm glad I never posted about Leonor...

Tue, 23 Aug 2005

Blogging in Catalan

When I started this blog a year and a half ago, I was maintaining a blog in my team's webpage, in Catalan, to write about my triathlon-oriented stuff. The momentum this webpage had acquired is now mostly lost, and I have no energy to promote its use among my team members once more.

Recently, some Softcatalà people started a Planeta Softcatalà for the blogs of all the organisation's members. If you have a look, you'll see that my blog entries are clearly distinct to the rest of my friends in there: I'm the only one with an English blog.

For some time I've been wondering about dividing this blog in two sections, en and ca, and point the different planets to the appropriate languages. I think I would still give English posts some priority, but there are some things I'd rather write in Catalan (I think I feel like posting about my recent stay in the Pirineus in Catalan, for example). What do people do with respect to multi-language blogs? Catalan content probably wouldn't be too ok for Planet Debian or Planet GNOME, but would my Catalan readers want to continue reading my English content?

If you follow my blog, your comment is welcome.

Tue, 14 Jun 2005

Comments upgrade

I just upgraded the comments plugin from the PyBlosxom contrib prerelease distribution. You should not find tracebacks so easily in this blog now, and actually submitting comments without an email address won't break it badly anymore. Thanks for the pointer, will!

Tue, 24 May 2005

Upgrade to pyblosxom 1.2

Today, being on vacation and with little fun stuff to do, I decided to have a look at my old blog spam problem. Lately, I had been using a poor-man's spam cleaner for the comment spams, consisting on combining find, grep with an always growing list of forbidden patterns, and rm. This worked well for some time, and the spam problem was a minor annoyance now: I just had to check for non-removed entries every now and then and add those patterns to the regexp.

Yesterday I found out I had something like 3.000 new comments, so I thought my cheap system was broken and it hadn't deleted anything in many days. Nope, it was working correctly according to the logs, but everytime it ran it deleted something like 100 files or so. After adding the missing patterns and deleting the thousands of new files, I observed my webserver logs with tail -f for a moment and found I was getting one new comment every two seconds or so. WTF?! Are they generally getting this aggresive everywhere, or is this dude just pissed about my site? I hope the mail to the corresponding abuse@ address works.

As they submitted them quicker than the slow CPU could delete them, I removed comments temporarily, and looked at installing PyBlosxom 1.2, as people had told me there's improvements against spam in this release.

This site is now running 1.2, but I see nothing spam-oriented in the new comments plugin. Does anyone know what the Nice Way of blocking spam in PyBlosxom is, that is not too expensive CPU-wise? Comments should be working right now.

On another note, the site is crawling today because of the two triathlon pics I posted earlier, which are making people hit MaxClients quite fast.

Mon, 21 Feb 2005

Blog is back

Shortly after posting about referal spam killing my box a few times in two days, things got a lot worse and the box would go down every hour or so. As natura.oskuro.net is, besides a home webserver, a NAT box for my father's Internet connection, having the box more dead than alive was quite unacceptable, and I had to stop Apache2 until I found another place for the blog.

Mako, jacobo (who is back into blogging, for the joy of many in #gpul) and a few others offered temporary hosting for this site while sto and I decide on renting a UML-based box or whatever.

Before moving somewhere else, I tried a few of the last options at the old, slow box, and it seems PyBlosxom caches are really working, at least for now. Despite having gone over a few spam attacks since Saturday, it looks like the box is cutting it quite ok. mrtg reports a few high load peaks over the night, but nothing that kills it. I used the dbm-based pyblosxom cache driver, and the first difference is that apparently I don't get one process per request anymore, and only that prevents running out of memory. I've had one case where the blog would be empty, which was fixed by just rm'ing the cache db. If it happens again, I'll try with the entrypickle cache driver to see if there's any improvement.

Anyway, even if it still works, it's obvious a Pentium 150Mhz is not enough these days, and will have to find something cheap to host my stuff as soon as possible. In the following days I will finish the migration to a new domain name, which will be a start. oskuro.net doesn't make much sense anymore, and quite probably I will let it expire next year.

Page 0 of 2  >>