natura upgraded to lenny
natura.oskuro.net
, the home server which still serves this
blog, has been suffering hardware problems for some weeks. Apparently the
hard drive is failing intermittently, so every now the kernel starts spewing
out noisy errors about its main disk dying. If I notice this quickly, it can
be rebooted and that normally fixes it for a few more days. But if I don't,
it'll end up giving nasty bus errors
which will make remote
logins a challenge. Most processes still work, but the filesystem appears
to be gone. It's easy to know what's going on if you visit the blog's url
and get some 404
, and in that case I can only phone my father
and tell him to press the reset button (I've tried sysrqd, but I need to
open the port in the router and haven't had chance to do that yet).
So it was time to do something about it, and the other day I installed a
dirty 40GB drive on the second IDE controller, in case I could find the time
to do somethng about it. Being with an endless pharyngitis that doesn't seem
to get cured entirely, I've had some time today to look at it. This evening,
I was about to transfer all the system to the new disk (it's half
the size as the broken one, and probably slower, but it hopefully has
no bad sectors), but I decided to upgrade the system first.
natura
was first installed in late 1997 or at the beginning
of 1998, using the Debian bo install media on a Pentium 150MHz, and
has gone through seven dist-upgrades which, as far as I can remember, have
always worked out without major problems.
The upgrade to lenny hasn't been an exception. The server has
gradually lost many of the services it once hosted, so there aren't too many
services to take care of anymore. All the mail services I setup for my father
ended up being deprecated as they started to get used to Hotmail, GMail and
so on, and the frequent hardware crashes made me switch them to the Linksys
based DHCP server. In the end, the problems I saw after the upgrade were
very similar to what I faced when I
upgraded to etch:
apache2
restored the 000-default
symlink in
the sites-enabled
dir, which resulted in my website showing
the classic “It works” message for a while.
- Apache's suexec support had moved to its own package, and was very well
documented in the NEWS file, but I somehow failed to notice for some time
and kept wondering what was going on.
- The Python upgrade again affected my Pyblosxom install. Fortunately it was
a minor problem; I just had to add a
coding=utf-8
line to the
beginning of my config.py
to get it working.
dhcp3-server
apparently restored its init scripts and got
started again. This time, it got removed.
Such an ancient install will clearly have old, obsolete packages. I
installed apt-show-versions
to find out what didn't match my
package sources. I found I had every single version of cpp, gcc and g++ from
2.95 to 4.3, and a myriad of obsolete libs. But there were also real gems:
defrag 0.73pjm1-7 installed: No available version in archive
figlet 2.2.1-3 installed: No available version in archive
ipmasqadm 0.4.2-2 installed: No available version in archive
isapnptools 1.26-5 installed: No available version in archive
ms-sys 2.1.0-1 installed: No available version in archive
queso 0.980922b-3 installed: No available version in archive
update 2.11-4 installed: No available version in archive
Spaniards will remember “queso” because it was written by Jordi Murgó and
became a classic tool to find out what OS was running on a remote host.
“update” was apparently needed to flush your filesystems prior to Linux 2.2.8,
and “defrag” is obvious, although leaves me wondering why it was needed at
the time.
With the upgrade done successfully, next step is trying to get the system
transfered to the spare hard drive. For this, I first partitioned it creating
a primary partition using up more or less half of the available space, and
setup a LVM volume, leaving some free PE's in the volume group just in case
I want to do snapshots in the future, and formatted it using ext3. I then
transfered the system to the new disk and now face the boot challenge.
I haven't created a boot partition and that should be a double problem:
the BIOS is buggy and will only boot from the first 1024 cylinders, and my
root is on LVM and GRUB legacy might not like it (but I'm not sure). However,
I've become a big fan of GRUB2, and know I will be able to boot no
matter what my BIOS thinks of my disks and regardless the complex root
partition setup I throw at it. The plan is to install GRUB onto the new
drive's MBR, and set it up using the ata
module, which should
allow to ignore what the BIOS says, and read beyond cylinder 1024 or even boot
from CD-ROM. However, this isn't a setup I haven't tried before, and a single
failure will result in me taking a train to fix it on-site.
So, GRUB experts out there, any suggestions? Of course, for now I guess I
can install GRUB in the current drive's MBR and make it boot the old kernel
using the new system as root, but that's dirty and would just postpone the
problem.
00:13 |
[] |
# |
(comments: 1)
Upgrade to PyBlosxom 1.4.3
This week I spent some time upgrading
PyBlosxom to version 1.4.3.
I was still using 1.2, which probably was insecure and buggy. This is the
first step in a bigger plan to replace Apache2 with
nginx in this server, but that will come
later.
I was lucky to find PyBlosxom's author,
Will, on IRC at the right time,
who kindly answered a few questions and helped solve a few issues with
the comments plugin and flavours. So, after a while, I had fixed a few subtle,
4 year old bugs in my XHTML templates and more notably, fixed lots of small
bits in the rss feed, which finally makes Liferea and
Advogato like my entries.
But, the biggest achievement was getting a brand new
comments.py
plugin from Will, which allows to close comments
on entries after an expiration date. So, even if I was happily using
Mako's Akismet plugin, I still was getting 5 or 6
spams each day on very old entries (favourites being one about
Alonso visiting València
and one remembering the
70th anniversary of the Spanish Civil War).
Well, not any longer.
My dear spammers, you can now go pester someone else, or pick new entries
pretty quickly before they get closed down. It's been a nice fight, but it's
a good time to wish you go away and fuck off. With love, Jordi.
Thank you, Will!
16:24 |
[] |
# |
(comments: 1)
natura upgraded to etch
Last week I started the final round of Debian upgrades for the servers
I maintain here and there, which is mostly complete today. I haven't been so
lucky with upgrades this time, for a long list of different reasons. In the
end, the smoothest upgrades were those boxes I upgraded when etch froze or
so.
natura.oskuro.net is the box serving these pages. It's an old,
extremely noisy Pentium 150 which I've been
intending to replace
for a while now. I started the upgrade early on Thursday, knowing it'd take
a while (natura takes its time only to read the Dpkg database), and it had
apparently finished when I was ready to leave the office.
Three issues:
- apache2's enabled modules were forgotten and I had to reenable them again,
plus IIRC the default "It works!" site got reenabled, hiding the blog for a
little while.
- bind didn't like some change in one of the default configs and I failed
to noticed it had not started. After the reboot, the box was unreachable
remotely, but that was OK as I had to go visit my dad that night for his
birthday, so it was fixed after one hour or so.
- Last, my blog stopped working, and while I thought it'd have something to
do with apache2.2, I had no time to have a close look as I had to leave to
Lleida this weekend. Yesterday I discovered that for some reason, the old
tar.gz PyBlosxom install
wasn't liking the new python2.4, so I decided it was about time to move to
PyBlosxom 1.3, from the Debian package. Not without the mandatory "It works!"
post published in the gazillion Planets, natura was ready to go.
The very same night of last Thursday, I decided to dist-upgrade the box
which serves the
Spanish Debian website mirror. That's
the only purpose on the box, so you can imagine the upgrade should have been
pretty straight-forward. And so it seemed, until, in the middle of unpacking,
dpkg died with a horrible I/O error
, and I dropped into an
unusable remote terminal with no working commands. Fortunately, apache2 was
still up and running, and the web service has been working without
interruption since the hard drive crash, albeit with no syncs from
www-master.
Today, Sergio visited
the campus and had a look. It was a XFS crash, which got cleanly repaired
using an install CD. We have an empty partition in the box, and will probably
move the system to it temporarily, and back to the RAID, but on ext3. When
the box was back online, I just had to resume the upgrade process, make mdadm
happy and update lilo.conf
before rebooting into the new
kernel.
This box uses LILO for some obscure reason I can't remember too clearly
anymore. The box has just one partition on a md array, on two SCSI disks on
a aic7xxx-based controller. Can anyone hint me why GRUB would have failed on
us back in sarge, and if any fixes in the etch version would
work any better? Using LILO here is error prone, and basically feels like a
step back. Anyway, www.es.debian.org
is now back up and running with updated content.
Sindominio.net had its bi-annual
upgrading party last monday, but unfortunately I wasn't able to help much
as when I tried to log into the server, I must have caught the system in the
middle of some key lib upgrade or something, and again I was locked in a
unusable shell which would only segfault. Given my previous experience, I
assumed that something had gone wrong and the box would need to be fixed at
the console, and after 20 minutes I gave up helping on that front. Until I
noticed, quite a long while later that I was still getting mail from the
server. I managed to log in to discover the upgrade was done, with just
a few bits remaining to be done. The major issues were encountered with our
pam and ldap setup, plus
nscd kept dying
causing quite a lot of mayhem all over the place. Great work from Seajob,
Syvic, nogates, apardo and the rest of the people who handled it! With etch,
we can finally move back to an official Debian kernel, something we've been
longing to do for a long time. The only pending upgrade issue is that we need
to move from our old jabber server to either the traditional
jabberd 1.x or
ejabberd; our
current implementation is no longer supported in Debian.
The last of the etch upgrades stories involves
Sofcatalà's servers. The box was
running on a CentOS 4.4, which was moved
away into a subdir just after booting Debian-Installer, and then lobotomised
so it would run as a Linux-VServer
under a new Debian etch install. I'll probably write more details about it
soon though, as it could be a maybe less scary alternative to Guillem's
debtakeover.
Yay for etch!
10:58 |
[] |
# |
(comments: 2)
Silent home servers
The computer which hosts this blog is a venerable Pentium 150Mhz, with
64Mb of physical memory and two decently sized disks. It has been running
non-stop mostly without hiccups for several years, and I'm quite happy with
it, even if the processing power is so scarce I've been having to tune down
some services as Debian has gotten more resource hungry, dist-upgrade after
dist-upgrade.
Natura is my 2nd oldest Debian install, coming back from Ham, and after a
while it became a home server when it was replaced by an Athlon 700Mhz at my
father's house. The only hardware incidents are all related to blackouts or
storms: two dead disks and one power supply. The CPU died years ago, but I
discovered that many months later. I guess it wasn't so necessary. :)
It is time to replace natura, though. The components are aging and they
have become quite noisy, despite my attempts to cleanup the dust. Lately it
is so loud that I can't understand how my dad can actually get work done with
that persistent noise in the room. Besides, it'd be good to get just a little
bit more of CPU power to do a few things that have been postponed for a while
now. I have been looking for offerings in the embedded devices market.
I am looking for a device with the following characteristics:
- Silent: this is a must. If fans aren't involved, that's
great, but I know there are some devices with just one fan for the hard
drives, etc. which are really silent too. Noise is the #1 reason I want to
get a replacement.
- CPU power and RAM: it doesn't need to be too powerful, but of course an
improvement over a Pentium 150 Mhz is expected. :) The minimal RAM would be
128Mb, I guess, and if it's expandable/replaceable, that'd be a big plus.
- Power consumption: I have other boxes around which I haven't used to
replace natura to get more CPU power because I've always assumed that it'd be
hard to match the Pentium's power consumption. As it's up 24/7, I want it to
be good in this area. AFAIK, the devices I'm looking for do quite well there,
though.
- Hard drives: many offerings accept two or four HDs inside the case. I won't
need four, but the possibility of setting up RAID is quite attractive.
- Hackable: should be supported by GNU/Linux, and if d-i does a good job
on it, bonus!
- Price: last, but not least, I'm willing to spend some money on this, but
I probably don't aim for the most expensive devices...
I've found that the
Thecus YES Box N2100
is one of the most interesting offerings: 2 Gigabit ethernet ports, two
internal SATA HD bays, 3 USB ports... but is a bit too expensive: 350€ (without
disks). tbm also told me to look at some
cheaper PowerPC devices, but I forgot the name right now.
So, dear Lazyweb, what would you recommend as a natura replacement for
a home server?
17:36 |
[] |
# |
(comments: 22)
Hard drive failure
On Thursday, I went up to my father's house to pick up my desktop, now that
I finally have Internet access at home so I can have permanent access to it
from a more convenient place. While I was there, I decided it'd be a good idea
to do an upgrade on my dad's box.
While I was doing this, the nfs share of the Debian mirror in the home
server that serves this blog stopped working, and I couldn't ssh in, although
the console and ICMP appeared to work. Having no monitor attached to that box,
I decided to reboot it and resolve the problem in a lame way. Had I known my
primary hard drive had decided enough is enough, I would have looked for a
monitor instead.
After reboot, the server started doing a very loud and scary noise, which
was not new to me. When the BIOS tried to probe the disk, it would not start up
and would instead whine like that. When this happened in the past, switching
the box for a few minutes and trying again was enough. Not this time, though,
it looked like it was the end.
Horrified by the fact that I had no really useful backup of /etc and /home,
I tried over and over to get it working. I plugged the HD into another old AT
box (which once served as my GNU Hurd playground), but I still got those scary
noises. At least I knew it had nothing to do with the mother board.
As it was getting late, I decided to take the HD with me, and another Maxtor
of the same size (but not exact model) with me, to have a look in office the
next day. I thought I'd have to restort to try to transplant the logic board
from the good one to the faulty disk, but before that, I realised I could try
mounting it in a USB cage. For some reason or another, this worked, and I
quickly saved thetwo partitions it contained with no read errors.
After this, I realised I had not done the copy as root, so I had lost all my
ownerships. I plugged the disk again, and re-rsynced. It still worked.
Alleviated, I went back home, copied the data to another old disk, and I don't
remember why, I tried mounting the faulty disk again. I got scary noises even
using the USB case, and no other tries have been successful, so I think it's
completely dead now, just before my final mount which saved the data.
The box is now back online, using a Seagate disk I had stored in a drawer
and with no notes on it about it having any kind of problem. I suspect I will
need to do real backups now, because the drive isn't as safe as it
looked...
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
hda: drive not ready for command
ide0: reset: success
My little P150 needs a 6GB drive. I'll have to find one somewhere.
03:28 |
[] |
# |
(comments: 1)
Better not comment
Lately, there's a clear trend in my webserver stats. Since mid September,
the top three search strings for oskuro.net are:
- Naked men (with like 60% of occurrences)
- Fernando Alonso
- Skinny dipping
I'm glad I never posted about Leonor...
06:56 |
[] |
# |
(comments: 1)
Blogging in Catalan
When I started this blog a year and a half ago, I was maintaining a
blog in my team's webpage,
in Catalan, to write about my triathlon-oriented stuff. The momentum this
webpage had acquired is now mostly lost, and I have no energy to promote its
use among my team members once more.
Recently, some Softcatalà
people started a Planeta Softcatalà
for the blogs of all the organisation's members. If you have a look, you'll see
that my blog entries are clearly distinct to the rest of my friends in there:
I'm the only one with an English blog.
For some time I've been wondering about dividing this blog in two sections,
en and ca, and point the different planets to the appropriate
languages. I think I would still give English posts some priority, but there
are some things I'd rather write in Catalan (I think I feel like posting about
my recent stay in the Pirineus in Catalan, for example). What do people
do with respect to multi-language blogs? Catalan content probably wouldn't
be too ok for Planet Debian or
Planet GNOME, but would my Catalan
readers want to continue reading my English content?
If you follow my blog, your comment is welcome.
13:13 |
[] |
# |
(comments: 5)
Comments upgrade
I just upgraded the comments
plugin from the PyBlosxom contrib
prerelease distribution. You should not find tracebacks so easily in this blog
now, and actually submitting comments without an email address won't break it
badly anymore. Thanks for the pointer,
will!
18:14 |
[] |
# |
(comments: 0)
Upgrade to pyblosxom 1.2
Today, being on vacation and with little fun stuff to do, I decided to
have a look at my old blog spam problem. Lately, I had been using a poor-man's
spam cleaner for the comment spams, consisting on combining find, grep with
an always growing list of forbidden patterns, and rm. This worked well for
some time, and the spam problem was a minor annoyance now: I just had to
check for non-removed entries every now and then and add those patterns to
the regexp.
Yesterday I found out I had something like 3.000 new comments, so I thought
my cheap system was broken and it hadn't deleted anything in many days. Nope,
it was working correctly according to the logs, but everytime it ran it deleted
something like 100 files or so. After adding the missing patterns and deleting
the thousands of new files, I observed my webserver logs with
tail -f
for a moment and found I was getting one new comment every
two seconds or so. WTF?! Are they generally getting this aggresive everywhere,
or is this dude just pissed about my site? I hope the mail to the corresponding
abuse@ address works.
As they submitted them quicker than the slow CPU could delete them, I
removed comments temporarily, and looked at installing
PyBlosxom 1.2, as people had
told me there's improvements against spam in this release.
This site is now running 1.2, but I see nothing spam-oriented in the new
comments plugin. Does anyone know what the Nice Way of blocking spam in
PyBlosxom is, that is not too expensive CPU-wise? Comments should be
working right now.
On another note, the site is crawling today because of the two triathlon
pics I posted earlier, which are making people hit MaxClients quite fast.
19:47 |
[] |
# |
(comments: 2)
Blog is back
Shortly after posting about referal spam killing my box a few times in two
days, things got a lot worse and the box would go down every hour or so. As
natura.oskuro.net
is, besides a home webserver, a NAT box for my
father's Internet connection, having the box more dead than alive was quite
unacceptable, and I had to stop Apache2 until I found another place for the
blog.
Mako,
jacobo (who is back into blogging, for
the joy of many in #gpul) and a few others offered temporary hosting for this
site while sto and I decide on
renting a UML-based box or whatever.
Before moving somewhere else, I tried a few of the last options at the old,
slow box, and it seems PyBlosxom caches are really working, at least for now.
Despite having gone over a few spam attacks since Saturday, it looks like the
box is cutting it quite ok. mrtg reports a few high load peaks over the night,
but nothing that kills it. I used the dbm-based pyblosxom cache driver, and
the first difference is that apparently I don't get one process per request
anymore, and only that prevents running out of memory. I've had one case
where the blog would be empty, which was fixed by just rm'ing the cache db.
If it happens again, I'll try with the entrypickle cache driver to see if
there's any improvement.
Anyway, even if it still works, it's obvious a Pentium 150Mhz is not enough
these days, and will have to find something cheap to host my stuff as soon as
possible. In the following days I will finish the migration to a new domain
name, which will be a start. oskuro.net
doesn't make much sense anymore, and quite probably I will let it expire next
year.
18:07 |
[] |
# |
(comments: 3)
Page 0 of 2 >>