Fri, 08 Jan 2016

Weird VirtIO errors on a jessie KVM host: Fixed!

Yesterday I posted a desperate plea for help as I had no idea where else to look for clues on what was causing random I/O errors on the guests of our jessie KVM host.

Thanks to Michael Herold, who was kind enough to mail me after identifying our problem, now we know os-prober is to blame, triggering the problem on every kernel update on the host, and we have quickly uninstalled it from all our systems.

Thanks Michael! If you by any chance going to attend FOSDEM, I am so happily going to buy you beers!

Let's hope anyone else wondering what's going on with their filesystems will find the trail to these blog posts to find a quick solution!

Thu, 07 Jan 2016

Weird VirtIO errors on a jessie KVM host running Debian guests

Hi Interwebs! I'm facing a weird issue with one of our server's at work, involving Debian jessie, libvirt and Debian guests using VirtIO drivers. This is a plea for help. :)

Basically, we are getting random VirtIO errors inside our guests, resulting in stuff like this:

[4735406.568235] blk_update_request: I/O error, dev vda, sector 142339584
[4735406.572008] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -5 writing to inode 1184437 (offset 0 size 208896 starting block 17729472)
[4735406.572008] Buffer I/O error on device dm-0, logical block 17729472
[ ... ]
[4735406.572008] Buffer I/O error on device dm-0, logical block 17729481
[4735406.643486] blk_update_request: I/O error, dev vda, sector 142356480
[ ... ]
[4735406.748456] blk_update_request: I/O error, dev vda, sector 38587480
[4735411.020309] Buffer I/O error on dev dm-0, logical block 12640808, lost sync page write
[4735411.055184] Aborting journal on device dm-0-8.
[4735411.056148] Buffer I/O error on dev dm-0, logical block 12615680, lost sync page write
[4735411.057626] JBD2: Error -5 detected when updating journal superblock for dm-0-8.
[4735411.057936] Buffer I/O error on dev dm-0, logical block 0, lost sync page write
[4735411.057946] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal
[4735411.057948] EXT4-fs (dm-0): Remounting filesystem read-only
[4735411.057949] EXT4-fs (dm-0): previous I/O error to superblock detected

(From an Ubuntu 15.04 guest, EXT4 on LVM2)

Or,

Jan 06 03:39:11 titanium kernel: end_request: I/O error, dev vda, sector 1592467904
Jan 06 03:39:11 titanium kernel: EXT4-fs warning (device vda3): ext4_end_bio:317: I/O error -5 writing to inode 31169653 (offset 0 size 0 starting block 199058492)
Jan 06 03:39:11 titanium kernel: Buffer I/O error on device vda3, logical block 198899256
[ ... ]
Jan 06 03:39:12 titanium kernel: Aborting journal on device vda3-8.
Jan 06 03:39:12 titanium kernel: Buffer I/O error on device vda3, logical block 99647488

(From a Debian jessie guest, EXT4 directly on a VirtIO-based block device)

When this happens, it affects multiple guests on the hosts at the same time. Normally they are severe enough that they end up with a r/o file system, but we've seen a few hosts survive with a non-fatal I/O error. The host's dmesg has nothing interesting to see.

We've seen this happen with quite heterogeneous guests:

In short, we haven't seen a clear characteristic in any guest, other than the affected hosts being the ones with some sustained I/O load (build machines, cgit servers, PostgreSQL RDBMs...). Most of the times, hosts that just sit there doing nothing with their disks are not affected.

The host is a stock Debian jessie install that manages libvirt-based QEMU guests. All the guests have their block devices using virtio drivers, some of them on spinning media based on LSI RAID (was a 3ware card before, got replaced as we were very suspicious about it, but are getting the same results), and some of them based on PCIe SSD storage. We have some other 3 hosts, similar setup except they run Debian wheezy (and honestly we're not too keen on upgrading them yet, just in case), none of them has ever shown this kind of problem.

We've been seeing this since last summer, and haven't found a pattern that tells us where these I/O error bugs are coming from. Google isn't revealing other people with a similar problem, and we're finding that quite surprising as our setup is quite basic.

So, dear Interwebs, have you seen this? We could use any comment (on the blog, or in Debian bug #810121, or kernel bug 110441) that clues us on what's to blame here. Thanks in advance!

Update: We finally know what's going on! The problem is gone at long last!

Thu, 07 Aug 2014

A pile of reasons why GNOME should be Debian jessie’s default desktop environment

GNOME has, for some reason or another, always been the default desktop environment in Debian since the installer is able to install a full desktop environment by default. Release after release, Debian has been shipping different versions of GNOME, first based on the venerable 1.2/1.4 series, then moving to the time-based GNOME 2.x series, and finally to the newly designed 3.4 series for the last stable release, Debian 7 ‘wheezy’.

During the final stages of wheezy’s development, it was pointed out that the first install CD image would not longer hold all of the required packages to install a full GNOME desktop environment. There was lots of discussion surrounding this bug or fact, and there were two major reactions to it. The Debian GNOME team rebuilt some key packages so they would be compressed using xz instead of gzip, saving the few megabytes that were needed to squeeze everything in the first CD. In parallel, the tasksel maintainer decided switching to Xfce as default desktop was another obvious fix. This change, unannounced and two days before the freeze, was very contested and spurred the usual massive debian-devel threads. In the end, and after a few default desktop flip flops, it was agreed that GNOME would remain as the default for the already frozen wheezy release, and this issue would be revisited later on during jessie’s development.

And indeed, some months ago, Xfce was again reinstated as Debian’s default desktop for jessie as announced:

Change default desktop to xfce.

This will be re-evaluated before jessie is frozen. The evaluation will
start around the point of DebConf (August 2014). If at that point gnome
looks like a better choice, it’ll go back as the default.

Some criteria for that choice will include:

* Popcon numbers for gnome on jessie. If gnome installations continue to
  rise fast enough despite xfce being the default (compared with, say
  kde installations), then we’ll know that users prefer gnome.
  Currently we have no data about how many users would choose gnome when
  it’s not the default. Part of the reason for switching to xfce now
  is to get such data.

* The state of accessability support, particularly for the blind.

* How well the UI works for both new and existing users. Gnome 3
  seems to be adding back many gnome 2 features that existing users
  expect, as well as making some available via addons. If it feels
  comfortable to gnome 2 (and xfce) users, that would go a long way
  toward switching back to it as the default. Meanwhile, Gnome 3 is also
  breaking new ground in its interface; if the interface seems more
  welcoming to new users, or works better on mobile devices, etc, that
  would again point toward switching back.

* Whatever size constraints exist for CD or other images at the time.

--

Hello to all the tech journalists out there. This is pretty boring.
Why don’t you write a story about monads instead?

― Joey Hess in dfca406eb694e0ac00ea04b12fc912237e01c9b5.

Suffice to say that the Debian GNOME team participants have never been thrilled about how the whole issue is being handled, and we’ve been wondering if we should be doing anything about it, or just move along and enjoy the smaller amount of bug reports against GNOME packages that this change would bring us, if it finally made it through to the final release. During our real life meet-ups in FOSDEM and the systemd+GNOME sprint in Antwerp, most members of the team did feel Debian would not be delivering a graphical environment with the polish we think our users deserve, and decided we at least should try to convince the rest of the Debian project and our users that Debian will be best suited by shipping GNOME 3.12 by default. Power users, of course, can and know how to get around this default and install KDE, Xfce, Cinnamon, MATE or whatever other choice they have. For the average user, though, we think we should be shipping GNOME by default, and tasksel should revert the above commit again. Some of our reasons are:

In short, we think defaulting to GNOME is the best option for the Debian release, and in contrast, shipping Xfce as the default desktop could mean delivering a desktop experience that has some incomplete or rough edges, and not on par with Debian quality standards for a stable release. We believe tasksel should again revert the change and be uploaded as soon as possible, in order to get people testing images with GNOME the sooner the better, with the freeze only two months away.

We would also like that in the future, changes of this nature will not be announced in a git commit log, but widely discussed in debian-project and the other usual development/decision channels, like the change of init system happened recently. We will, whichever the final decision is, continue to package GNOME with great care to ensure our users get the best possible desktop experience Debian can offer.

Sun, 29 Jul 2012

GUADEC 2012

I've been in A Coruña for this year's GUADEC since Tuesday night, and it rocked. I did a late registration after my first week at Collabora, which is sponsoring my stay here.

I came one day early to participate, as Debian's representative, at the yearly GNOME Advisory Board meeting, for the first time. It was a positive experience, which helped me get a grasp of the “big picture” of what the GNOME Foundation does. I also had the pleasure of visiting Igalia's awesome offices in the city, and puting faces to many names during the meeting.

I presented an overview of Debian's relation to GNOME, how our packaging team works and what are our goals and biggest problems as a GNOME downstream. We stirred some good debate as some other Advisory Board members share part of our problems. I should be posting a summary of what happened there for debian-project@ldo as soon as I have the time to scribble it.

I've met with GNOME Hispano people I hadn't seen since 2004 or 2006 in the best case, and catched up with many of them. I've also met many GPUL members I had know for over a decade via IRC, but never had met in person, and it was about time. And of course, I've got to known a good number of my new workmates at Collabora, and had fun with them around the conference, the beach and the numerous post-conference events.

Last, but not least, I ended up participating in the GNOME Olympics, substituting Rodrigo in Team B “Core Dumped”, along with Stefano, John, Bastien, Chema and Adam. WE WON, not thanks to me, but the statistics shine: I've won all FreeFA World Cups I've played :P so here's a PROtip: if you want to win next year, be sure to be my team mate, and more importantly, be sure Adam is not your rival. :)

Unfortunately, I'm only attending the core days so tonight I'll be flying back to Madrid on my way home in València. See you next year! A Coruña is a city that has impressed me quite a bit, and I'm looking forward to coming back for some more standard vacation at some point. :)

Sun, 03 Jun 2012

GNOME 3.4 in wheezy

Users of Debian sid will have noticed: the final (and interesting) bits of GNOME 3.4 have landed and if all looks as good as it does now, they should migrate to wheezy in about a week.

3.2 → 3.4 hasn't been as complicated as the previous horrible transition, but still had some complications due to Cogl/Clutter incompatibilities. Other than that, our major problem has been manpower, but this isn't new for many other Debian teams. We've also seen new incarnations of “Linux-only technology is now mandatory” which makes our lives a bit more miserable due to kfreebsd-* and hurd-i386, but for now we've still been able to dodge it. It seems wheezy+1 will be fun in that regard though, and we might need to take drastic approaches.

If all goes well and the current lot (GNOME Shell, Control Center, Settings daemon, Mutter...) transitions without additional problems, we should be wrapping up our transitions for wheezy with Evolution and friends (currently sitting in experimental), and hopefully GDM 3.4.

As we get many questions regarding the status of GDM in Debian, let's add a short note on this. Packaging GDM, at least in its current upstream form, is not a matter of unpacking a new tarball and editing debian/changelog. When Joss works on a new major version, the amount of tweaking to break away from stuff that works on other distros but is not so simple in Debian is outstanding (see, for example, the current unfinished work for GDM 3.2 in our SVN repo). In our case, to handle our GDM defaults, we even need changes to the underlying configuration system, dconf. This evidently takes some effort to do, and unfortunately our GDM expert has had little time for Debian lately, but we're confident we'll end up with a GDM in wheezy that is on par with Debian standards.

We are, as always, reachable at #debian-gnome in the OFTC IRC network. Have fun!

Thu, 29 Mar 2012

GNOME 3.4

The GNOME project released today GNOME 3.4, the second major update to the GNOME 3 platform. Congrats!

I know there's lots of polish and improvements to some of the major rough edges in GNOME 3.2, but I think that of all changes in this release, Epiphany really stands out, as you can see in blog posts by Xan and Diego.

Work to bring GNOME 3.4 to Debian wheezy users has been underway for a few weeks already, and some bits and pieces have been hitting unstable since the tarballs were released a pair of days ago. We still need more base work to be done before some exciting components like GNOME Shell can hit our archive, and we want to fix as many FTBFS with GLib 2.32 bugs as possible before pushing it to unstable, but all in all, hopefully this time, shepherding a major GNOME release to Debian testing won't be as painful as it was not so long ago. However, we have already identified some fun bits involving clutter, cogl and mutter in our initial analysis, but nothing that hopefully can't be dealt with in a civilised way.

As always, if you think you can help us, we're reachable at #debian-gnome at OFTC!

Thu, 23 Feb 2012

alsaconf

Removing alsaconf was one of the very few rewarding moments of these ten years of taking care of ALSA in Debian.

Not everyone agreed back then, and we still get some retaliation. :)

Date: Thu, 23 Feb 2012 02:59:31 +0100
From: <CENSORED>
To: jordi@debian.org
Subject: sabotage!

the removing of alsaconf without working(!) alternatives  was (AND IS!)  an
act of sabotage against millions of debian/alsa - users who needs stable
productive systems

you and all those proponents of removing this still needed alsaconf - program
will have to take the responsibility in front of an (us-) court for damages in
millions of dollars - amounts (lost man hours) all over the world

only a short while and we will have enough sponsors and witnesses around the
globe (and a very specialised, international labouring bureau of advocates) to
go to the court for prosecution.


we will not tolerate such an betray ("stable"? - do you believe, we're
fools??!!) against broad sections of the population and against the spirit of
free software!

it will be intresting to investigate, in whoms interests you've done so and
who the beneficiaries are ...


L.B.
conductor, publicist, whistleblower
Tue, 31 Jan 2012

GNOME Shell 3.2 in wheezy: a retrospective

When you read this, GNOME Shell 3.2 will (hopefully!) have finally transitioned to Debian’s testing suite.

Planet GNOME readers might think Debian now has outdated versions of software even in their development versions, or the distribution’s development marches at glacial pace. Wheezy GNOME users will finally have a Shell that matches the rest of their GNOME components, something that works with the Shell extensions website and much less problems and limitations compared to 3.0.2.

The reality is that GNOME 3.2’s packaging was quite ready back when it was released in late September, but a number of not-so-desirable situations held GNOME Shell from transitioning to testing until today, four months later. So, what happened?

TL;DR: transitioning from GNOME 2 → GNOME 3 is not so easy if you want to keep testing in a sane state, and when you need to deal with dozens of indirectly related packages, for more than 10 architectures… but it shouldn’t take nearly a full year, either…

Let’s go back to the last months of 2010. Debian squeeze is in very deep freeze, and the release team and many Debian developers are focusing on squashing as many release critical bugs as they can, in order to make Debian 6.0 the great release it ended up being. The GNOME project has recently delayed the big launch of GNOME 3.0 again, until March 2011; Debian has already settled on GNOME 2.28 for its release, although it will end up cherry-picking many updates from the 2.30 release modules.

With most of the stabilization work being done, many Debian GNOME team members were at that time working on packaging very early versions of what would end up being GNOME 3.0 technology: GTK+3.0, GNOME Shell, Mutter… and some brave users even tried to use it via the experimental archive.

On February 6th, Debian 6.0 was released, and soon after, on April 6th, GNOME made a huge step forward with the much anticipated release of GNOME 3.0. At that time, Debian developers were busy breaking unstable as much as they could, as it’s tradition on the weeks following a major release, and the Debian GNOME team was able to start moving some GNOME 3.0 libraries (those which were parallel-installable with their GTK+2.0 versions) to unstable.

However, moving the bulk of GNOME 3.0 to unstable wasn’t so easy. When you start doing that, you need to be sure you’re ready to have all affected packages in a “transitionable” state as soon as possible, to minimise the chances of blocking transitions of unrelated packages via the dependencies they pick up with rebuilds. All the packages involved in a transition need to be ready to go in the same “testing run”, for all supported architectures. When you’re dealing with dozens of GNOME source packages at the same time, many of which introduce new libraries, or worse, introduce incompatible APIs that affect many more unrelated packages, things get hairy, and you need a plan.

So, Joss outlined what a sane approach to this monster transition could look like. The amount of work to do was what we call “fun” on #debian-gnome. In a nutshell, we had to deal with quite a few transitions, starting with having a newer version of libnotify in unstable, and a pre-requisite for that was making sure all the packages using libnotify1 were ready to use the source-incompatible libnotify4, and this meant preparing patches and NMUs for many of our packages, as well as many others not under our control.

Before starting a controlled transition like this one, we had to get an ACK from the release team, who was busy enough handling other huge transitions like Perl 5.12, so by the time we got our own slot, we were well into Summer.

With libnotify done in August, it was time to get our hands dirty with more exciting stuff, like getting Nautilus in testing. This meant bumping a soname and requiring all packages providing Nautilus extensions to migrate to GTK+3.0, or drop the extension entirely, as you can’t mix GTK+2.0 and GTK+3.0 symbols in the same process. However, in GNOME 3.0, automounting code had moved from Nautilus to gnome-settings-daemon, so in order to not break filesystem automounting in testing for an unreasonable amount of time, both Nautilus and g-s-d needed to go in at the same time. The fun thing is that g-s-d dragged glib2.0, gvfs, gnome-control-center, gdm3, gnome-media, gnome-session and gnome-panel into the equation, so this transition needed extra planning and a lot more work than initially expected: migrating all nautilus extensions, plus ensuring all Panel applets had migrated to GTK+3.0 and the new libpanel-applet-4 interface. In short, this was the monster transition we were trying to avoid.

By the time all this mess was sorted out, GNOME 3.2 had been released, and for what users said, it was a lot better than 3.0. We still had no more than a few bits and pieces of 3.0 in testing, and we were working hard to get 3.0 in wheezy. With all the excitement around 3.2, at times it was difficult to explain outsiders why we were beating a dead 3.0 horse… Going back to our huge transition, it was just a matter of time before all the packages would be built and be ready to enter, on the same run, in testing.

A few weeks later, in early November and after several rounds of mass-bug-filings, fixing unrelated FTBFS, many NMUs, package removal requests and dealing with any possible problem that could block our transition, everything seemed to be set, and our release team magicians had everything in place for the big magic to happen. However, our first clash with the rest of Debian happened a few hours before our victory, in the form of an unannounced ruby-gnome2 upload which resetted the count for everyone. It was fun to see the release team trying all sorts of black magic in an attempt to mitigate the damage. Fortunately, after a few tries they managed to fool britney (the script that handles package transitions from unstable to testing) somehow, and the hardest part of the job was done with just one day of delay.

At last, the core of GNOME 3 was in testing, and testing users found soon after. The rest of the week saw a cascade of hate posts against GNOME 3 in Planet Debian, and personally I didn’t find that especially motivating to keep on working on the rest of GNOME bits. With experimental clear of GNOME 3.0 stuff, we finally were able to focus on packaging whatever GNOME 3.2 components were not already done, and preparing for what should be a plain simple transition of GNOME 3.0 to 3.2.

After our share of wait for a transition slot, as Perl 5.14, ICU and OpenSSL were in the line before us, and after dealing with a minor tracker 0.12 transition, we were ready for our next episode: evolution-data-server.

At first sight, we thought this would be a lot easier, but it still got a bit hairy due to evo-data-server massive soname bumps. We were given our slot just before Christmas, after a few weeks of wait for others to finish their migration rounds, and most of the pack entered wheezy a few days before the new year.

No rejoicing, though, as GNOME Shell 3.2 didn’t make it. First, we discovered it was FTBFS on kFreeBSD architectures, as NetworkManager had been promoted from optional to required, for apparently no good reason, leaving the BSD world in the cold, including our exotic GNU/kFreeBSD architectures. Now, let’s clarify that I’m a supporter of the Debian kFreeBSD architectures and was really happy to see it accepted as a technology preview in squeeze. However, as you know, GNOME Shell currently requires hardware acceleration to run, a requirement hardly met in kFreeBSD, unless you’re using a DRI1 X driver. We seriously doubted anyone had ever ran a GNOME 3 session on kfreebsd-*. However, if it didn’t build, it was a blocker bug for GNOME Shell. We considered creating different meta-packages for kFreeBSD architectures, to conclude it’d be a mess, so our awesome Michael Biebl ended up cooking up a patch that restored the ability to build the Shell without NetworkManager support.

With this out of our way, we just needed to upload Michael’s fix and watch the buildds do their part of the job. Or maybe not?

Enter Iceweasel 9.

In parallel, and with incredible bad timing, Iceweasel 9.0 was uploaded to Debian the very same day it was released by Mozilla. Again, it greeted us with a nasty surprise: yet another mozjs API change, which made gjs FTBFS, which meant our kFreeBSD fixes would be unusable until someone who knew Gjs’ internals well enough bit the bullet and worked around the new API changes. Again, Michael Biebl tried to be our saviour, but unfortunately wasn’t able to fix all the problems, so we tried to focus on plan B.

Mozilla had released a fork of the mozjs that is included in Firefox, so that embedders would have a bit less of a hard time with these recurrent API changes. This was based on Firefox 4, and was already being packaged by Ubuntu. Gjs would build using this older version just fine, so we just needed to get it in Debian as soon as possible. We just needed to find a sucke^Wvolunteer that would be inclined to maintain the beast. Only after a few weeks we managed to get Chris Coulson, the Ubuntu packager, to maintain the package directly through the Debian archive via package syncs. However, his package had only been auto-compiled in the three Ubuntu architectures, that is amd64, armel and i386. It’s late January 2012, and we’ve been fighting this war for 10 months.

After getting some help from Michael to get the new package in shape for Debian standards, we were excited to sponsor it for Chris. Duh, after a few days in the NEW fridge, it was rejected by the ftp-masters. The license statement was missing quite a few details, so I went ahead and sacrificed a few hours of my copious free time to get this sorted out. A few days later, mozjs was accepted, but the result was horrible. It was very red. mozjs didn’t build on half of our targets.

Mike Hommey was quick to file a bug and point us to the most obvious fuckups. As he had dealt with this in the past as the Iceweasel maintainer, all of these issues were fixed and patches were ready to be applied verbatim or with minimal changes to our sources. With mozjs finally built successfully (although with severe problems on ia64), we were finally able to rebuild Gjs against it, upload GNOME Shell with our kFreeBSD fixes and wait until today for this mess to be over. Whew.

I can’t say I’ve enjoyed all the stages of this ride. Some bumps on the road were clearly there to test our patience, but it has helped me get back in touch with non-leaf GNOME packaging, which was all I was doing for a while due to being super-busy lately with studies. It also reminds me of the privilege of working side by side with some awesome people, not only Joss, Michael, Sjoerd, Laurent or Gustavo, to name just a few Debian GNOME team members, but also the receptive release team members like Julien or Cyril, and NEW-processing record-breaking ftp-master Luca. Without them, we might be trying to figure out the Nautilus transition since last Summer.

We really hope GNOME 3.4 will be a piece of cake compared to this. ;)

Thu, 29 Sep 2011

Installing GNOME 3 in Debian

The following is a quick HOWTO for the brave Debian users who want to upgrade to GNOME 3. Assuming you have an up to date system running sid, and experimental listed in your APT sources, perform the following complicated steps to end up having a functional GNOME 3 desktop:

apt-get install -t experimental gnome

Thanks go to Joss for putting together new GNOME 3 meta-packages, and the rest of the Debian GNOME people for months of hard planning and packaging work, and painful testing transition handling.

Before you ask, yeah, not all of GNOME 3.x is in unstable yet, but will soon be, as precedent transitions start clearing the way. And yeah, GNOME 3.2 will come just after the two remaining package sets enter testing. To compensate, you'll find that you have some GNOME leaf packages pending an upgrade to 3.2.0-1 while you read this.

Wed, 13 Jul 2011

Not going to DebConf 11

3 months ago, I was positive I would be attending DebConf 11 in Banja Luka, but as the time to buy tickets and plan the trip came closer, I began to realise I don't have lots and lots of vacation, and I probably prefer spending them doing something that absolutely rocks my world. I've always enjoyed the Debian conferences when I've been lucky to be there, but last year's experience in the Pyrenees was nothing a DebConf can compare to, and I've decided to spend time seeking similar experiences this summer.

With much regret, because I love meeting the wonderful people that make up Debian and DebConfs, I have to say that after all and once again, I won't make it.

Page 0 of 16  >>