Wed, 25 Apr 2007

natura upgraded to etch

Last week I started the final round of Debian upgrades for the servers I maintain here and there, which is mostly complete today. I haven't been so lucky with upgrades this time, for a long list of different reasons. In the end, the smoothest upgrades were those boxes I upgraded when etch froze or so.

natura.oskuro.net is the box serving these pages. It's an old, extremely noisy Pentium 150 which I've been intending to replace for a while now. I started the upgrade early on Thursday, knowing it'd take a while (natura takes its time only to read the Dpkg database), and it had apparently finished when I was ready to leave the office.

Three issues:

The very same night of last Thursday, I decided to dist-upgrade the box which serves the Spanish Debian website mirror. That's the only purpose on the box, so you can imagine the upgrade should have been pretty straight-forward. And so it seemed, until, in the middle of unpacking, dpkg died with a horrible I/O error, and I dropped into an unusable remote terminal with no working commands. Fortunately, apache2 was still up and running, and the web service has been working without interruption since the hard drive crash, albeit with no syncs from www-master.

Today, Sergio visited the campus and had a look. It was a XFS crash, which got cleanly repaired using an install CD. We have an empty partition in the box, and will probably move the system to it temporarily, and back to the RAID, but on ext3. When the box was back online, I just had to resume the upgrade process, make mdadm happy and update lilo.conf before rebooting into the new kernel.

This box uses LILO for some obscure reason I can't remember too clearly anymore. The box has just one partition on a md array, on two SCSI disks on a aic7xxx-based controller. Can anyone hint me why GRUB would have failed on us back in sarge, and if any fixes in the etch version would work any better? Using LILO here is error prone, and basically feels like a step back. Anyway, www.es.debian.org is now back up and running with updated content.

Sindominio.net had its bi-annual upgrading party last monday, but unfortunately I wasn't able to help much as when I tried to log into the server, I must have caught the system in the middle of some key lib upgrade or something, and again I was locked in a unusable shell which would only segfault. Given my previous experience, I assumed that something had gone wrong and the box would need to be fixed at the console, and after 20 minutes I gave up helping on that front. Until I noticed, quite a long while later that I was still getting mail from the server. I managed to log in to discover the upgrade was done, with just a few bits remaining to be done. The major issues were encountered with our pam and ldap setup, plus nscd kept dying causing quite a lot of mayhem all over the place. Great work from Seajob, Syvic, nogates, apardo and the rest of the people who handled it! With etch, we can finally move back to an official Debian kernel, something we've been longing to do for a long time. The only pending upgrade issue is that we need to move from our old jabber server to either the traditional jabberd 1.x or ejabberd; our current implementation is no longer supported in Debian.

The last of the etch upgrades stories involves Sofcatalà's servers. The box was running on a CentOS 4.4, which was moved away into a subdir just after booting Debian-Installer, and then lobotomised so it would run as a Linux-VServer under a new Debian etch install. I'll probably write more details about it soon though, as it could be a maybe less scary alternative to Guillem's debtakeover.

Yay for etch!

Thu, 19 Oct 2006

Silent home servers

The computer which hosts this blog is a venerable Pentium 150Mhz, with 64Mb of physical memory and two decently sized disks. It has been running non-stop mostly without hiccups for several years, and I'm quite happy with it, even if the processing power is so scarce I've been having to tune down some services as Debian has gotten more resource hungry, dist-upgrade after dist-upgrade.

Natura is my 2nd oldest Debian install, coming back from Ham, and after a while it became a home server when it was replaced by an Athlon 700Mhz at my father's house. The only hardware incidents are all related to blackouts or storms: two dead disks and one power supply. The CPU died years ago, but I discovered that many months later. I guess it wasn't so necessary. :)

It is time to replace natura, though. The components are aging and they have become quite noisy, despite my attempts to cleanup the dust. Lately it is so loud that I can't understand how my dad can actually get work done with that persistent noise in the room. Besides, it'd be good to get just a little bit more of CPU power to do a few things that have been postponed for a while now. I have been looking for offerings in the embedded devices market.

I am looking for a device with the following characteristics:

I've found that the Thecus YES Box N2100 is one of the most interesting offerings: 2 Gigabit ethernet ports, two internal SATA HD bays, 3 USB ports... but is a bit too expensive: 350€ (without disks). tbm also told me to look at some cheaper PowerPC devices, but I forgot the name right now.

So, dear Lazyweb, what would you recommend as a natura replacement for a home server?

Sun, 20 Nov 2005

Hard drive failure

On Thursday, I went up to my father's house to pick up my desktop, now that I finally have Internet access at home so I can have permanent access to it from a more convenient place. While I was there, I decided it'd be a good idea to do an upgrade on my dad's box.

While I was doing this, the nfs share of the Debian mirror in the home server that serves this blog stopped working, and I couldn't ssh in, although the console and ICMP appeared to work. Having no monitor attached to that box, I decided to reboot it and resolve the problem in a lame way. Had I known my primary hard drive had decided enough is enough, I would have looked for a monitor instead.

After reboot, the server started doing a very loud and scary noise, which was not new to me. When the BIOS tried to probe the disk, it would not start up and would instead whine like that. When this happened in the past, switching the box for a few minutes and trying again was enough. Not this time, though, it looked like it was the end.

Horrified by the fact that I had no really useful backup of /etc and /home, I tried over and over to get it working. I plugged the HD into another old AT box (which once served as my GNU Hurd playground), but I still got those scary noises. At least I knew it had nothing to do with the mother board.

As it was getting late, I decided to take the HD with me, and another Maxtor of the same size (but not exact model) with me, to have a look in office the next day. I thought I'd have to restort to try to transplant the logic board from the good one to the faulty disk, but before that, I realised I could try mounting it in a USB cage. For some reason or another, this worked, and I quickly saved thetwo partitions it contained with no read errors.

After this, I realised I had not done the copy as root, so I had lost all my ownerships. I plugged the disk again, and re-rsynced. It still worked. Alleviated, I went back home, copied the data to another old disk, and I don't remember why, I tried mounting the faulty disk again. I got scary noises even using the USB case, and no other tries have been successful, so I think it's completely dead now, just before my final mount which saved the data.

The box is now back online, using a Seagate disk I had stored in a drawer and with no notes on it about it having any kind of problem. I suspect I will need to do real backups now, because the drive isn't as safe as it looked...

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }

hda: drive not ready for command
ide0: reset: success

My little P150 needs a 6GB drive. I'll have to find one somewhere.

Thu, 10 Nov 2005

Better not comment

Lately, there's a clear trend in my webserver stats. Since mid September, the top three search strings for oskuro.net are:

  1. Naked men (with like 60% of occurrences)
  2. Fernando Alonso
  3. Skinny dipping

I'm glad I never posted about Leonor...

Tue, 23 Aug 2005

Blogging in Catalan

When I started this blog a year and a half ago, I was maintaining a blog in my team's webpage, in Catalan, to write about my triathlon-oriented stuff. The momentum this webpage had acquired is now mostly lost, and I have no energy to promote its use among my team members once more.

Recently, some Softcatalà people started a Planeta Softcatalà for the blogs of all the organisation's members. If you have a look, you'll see that my blog entries are clearly distinct to the rest of my friends in there: I'm the only one with an English blog.

For some time I've been wondering about dividing this blog in two sections, en and ca, and point the different planets to the appropriate languages. I think I would still give English posts some priority, but there are some things I'd rather write in Catalan (I think I feel like posting about my recent stay in the Pirineus in Catalan, for example). What do people do with respect to multi-language blogs? Catalan content probably wouldn't be too ok for Planet Debian or Planet GNOME, but would my Catalan readers want to continue reading my English content?

If you follow my blog, your comment is welcome.

Tue, 14 Jun 2005

Comments upgrade

I just upgraded the comments plugin from the PyBlosxom contrib prerelease distribution. You should not find tracebacks so easily in this blog now, and actually submitting comments without an email address won't break it badly anymore. Thanks for the pointer, will!

Tue, 24 May 2005

Upgrade to pyblosxom 1.2

Today, being on vacation and with little fun stuff to do, I decided to have a look at my old blog spam problem. Lately, I had been using a poor-man's spam cleaner for the comment spams, consisting on combining find, grep with an always growing list of forbidden patterns, and rm. This worked well for some time, and the spam problem was a minor annoyance now: I just had to check for non-removed entries every now and then and add those patterns to the regexp.

Yesterday I found out I had something like 3.000 new comments, so I thought my cheap system was broken and it hadn't deleted anything in many days. Nope, it was working correctly according to the logs, but everytime it ran it deleted something like 100 files or so. After adding the missing patterns and deleting the thousands of new files, I observed my webserver logs with tail -f for a moment and found I was getting one new comment every two seconds or so. WTF?! Are they generally getting this aggresive everywhere, or is this dude just pissed about my site? I hope the mail to the corresponding abuse@ address works.

As they submitted them quicker than the slow CPU could delete them, I removed comments temporarily, and looked at installing PyBlosxom 1.2, as people had told me there's improvements against spam in this release.

This site is now running 1.2, but I see nothing spam-oriented in the new comments plugin. Does anyone know what the Nice Way of blocking spam in PyBlosxom is, that is not too expensive CPU-wise? Comments should be working right now.

On another note, the site is crawling today because of the two traithlon pics I posted earlier, which are making people hit MaxClients quite fast.

Mon, 21 Feb 2005

Blog is back

Shortly after posting about referal spam killing my box a few times in two days, things got a lot worse and the box would go down every hour or so. As natura.oskuro.net is, besides a home webserver, a NAT box for my father's Internet connection, having the box more dead than alive was quite unacceptable, and I had to stop Apache2 until I found another place for the blog.

Mako, jacobo (who is back into blogging, for the joy of many in #gpul) and a few others offered temporary hosting for this site while sto and I decide on renting a UML-based box or whatever.

Before moving somewhere else, I tried a few of the last options at the old, slow box, and it seems PyBlosxom caches are really working, at least for now. Despite having gone over a few spam attacks since Saturday, it looks like the box is cutting it quite ok. mrtg reports a few high load peaks over the night, but nothing that kills it. I used the dbm-based pyblosxom cache driver, and the first difference is that apparently I don't get one process per request anymore, and only that prevents running out of memory. I've had one case where the blog would be empty, which was fixed by just rm'ing the cache db. If it happens again, I'll try with the entrypickle cache driver to see if there's any improvement.

Anyway, even if it still works, it's obvious a Pentium 150Mhz is not enough these days, and will have to find something cheap to host my stuff as soon as possible. In the following days I will finish the migration to a new domain name, which will be a start. oskuro.net doesn't make much sense anymore, and quite probably I will let it expire next year.

Fri, 11 Feb 2005

A few suggestions to parasites

Dear assholes,

If you plan to take advantage of my blog to rank your shitty pharmacy webpage high on Google, take the following into consideration:

Thanks for considering a symbiotic relationship in the future.

So the bastards did it again yesterday, and this time I had to drive to where the box is located and see what was going on in console. As expected, OOM killer fun, and by no means I could recover it, even after taking it off the network and trying to SysRQ it a bit.

It seems the spammers are trying to take advantage of referral stats now, and hit sites with tons of requests. Every request to /blog in this site means a not so cheap python process which takes quite some memory, which is a scarce resource in poor natura.oskuro.net. With just a few blog processes going on, the box starts swapping to its death.

I know, I should add limits to my Apache2 configuration, and possibly pyblosxom caches. For now, stopping spamd has helped a bit, as that process only was sucking ~35% of the memory.

Sat, 15 Jan 2005

One year

I just realised this blog made its first year online quite recently, after my first stage at Advogato. I wish I had more time to think about interesting stuff to talk about, though. Sometimes, this feels like the Debian GNOME team's announcement board. :)

Tue, 05 Oct 2004

ADSL upgrade

Yesterday I wasn't able to log into my home server from the office, and I assumed the load had skyrocketed again, as it happens every now and then. Timing was quite bad because I'm very busy in the evenings these days, but I went to my father's house to see what was going on, and when I got there I saw the server was mostly idling. WTF? Shortly after I noticed my local named wouldn't resolve barely anything, and I couldn't ssh out, as the connection would hang in the middle of the handshakes. I started looking at my 3com router configuration, seemed ok; rebooted the box, nothing changed; started cursing, which changed nothing either... until I realized it was probably a telco thing. I told my father "it'll probably be fixed automagically" and left the house.

When I came back from training, I managed to ssh in and quickly tested the downstream speed. As I suspected, the downtime was caused by Telefónica tweaking our stuff to upgrade the ADSL's of the area from 256/128 to 512/128. Uplink still sucks, but oh well, we got this for "free" (ie, we still pay way to much for crappy connections in Spain, but we're slowly getting what the rest of Europe seems to have).

Moreover, today the cable company finally opened up the street and installed their stuff to offer their service. 5 or 6 years late...