Much ado about debugging
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
Recently, an interaction problem between systemd and the kernel was reported. After a calm discussion, developers of both projects found ways in which behavior could be improved and set about coding up the solutions. The technical press was filled with glowing reports on another success of collaborative problem solving... or, perhaps, most of the preceding text is entirely fictional and the systemd "debug flag" problem spiraled out of control in several ways at once.
Actually, that description is not entirely fantasy, if one looks at the problem the right way. It turned out that systemd was using the debug argument from the kernel command line to turn on much of its own debugging output. As Linus Torvalds noted, that is exactly how this flag was intended to be used. But a mistake in the systemd camp caused an assertion to fire, generating so much output that the system was rendered unusable; the end result was an unbootable system. After some discussion, a couple of decisions were made:
- Systemd will
stop logging through the kernel once the journald logging daemon
is available; that will
cause much of that output to be directed elsewhere. There are also
patches floating around to cause systemd to recognize
systemd.debug, rather than plain debug, as the
signal to turn on its own debugging options. If merged into systemd,
this change will make it easier to turn on kernel debug output without
also enabling systemd's output (something which is already possible,
but not in The Way Kernel Developers Have Always Done It).
- The kernel developers have realized that it should not be possible to incapacitate a system by logging too much data from user space. Consequently, some sort of rate limiting will be applied to the /proc/kmsg interface. The proper nature of that limiting and how it will be controlled are still under discussion, but chances are good that some sort of change will find its way into the 3.15 kernel.
In other words, appropriate fixes are being applied on both sides to prevent this kind of problem from recurring. So a reasonable observer might well wonder why the technical press is full of headlines like Linus Torvalds suspends key Linux developer and Open war in Linux world. That comes down to less-than-optimal behavior on both sides of the fence — and even worse behavior in the press.
When Borislav Petkov first encountered this problem, he filed a bug
against systemd, asking that its behavior be changed. A little over one
hour later, systemd developer Kay Sievers closed it as "NOTABUG," saying
that the behavior was expected and that the kernel is not the sole keeper
of the debug flag: "Generic terms are generic,
not the first user owns them.
" A lengthy back-and-forth followed,
with developers reopening the bug and Kay closing it several times.
Eventually the discussion spilled over onto the linux-kernel list when
Steven Rostedt proposed hiding the
debug flag from user space entirely.
Shockingly, the move to linux-kernel did little to calm the conversation. Eventually Linus announced that he was not interested in accepting any patches from Kay until Kay's pattern of behavior (as seen by Linus) changed. It didn't take that long, though, for things to calm down and for various developers to start looking at real solutions to the problem. As of this writing, that thread has been silent for a few days.
In other words, what we have here is a story that has been seen many times over. A problem turns up that reveals suboptimal behavior by two interacting pieces of software. Developers for both projects are slow to acknowledge that they could be doing things better and point fingers at the other camp. Certain high-profile community members known for their occasionally over-the-top rhetoric live up to their reputations. But once people have some time (measured in hours) to calm down, the problems are fixed and everybody moves on.
That, alas, is not a story that plays well in much of the press. So, instead, various reporters tried to inflate it into some sort of spectacular showdown. The development community was not portrayed in a good light, and perhaps some of that was even deserved. But what was really conveyed by all those articles was that, after all these years, much of the technical press still has a poor (at best) understanding of how free software development communities work.
Proprietary software tends not to be followed by stories like this because the inevitable politics, profanity, and chair-throwing are kept behind closed doors and firewalls. We, instead, do most of it in the open — though flying furniture still tends to be an exceptional occurrence. These events can be fun to watch from a suitable distance and with enough popcorn. But they mean less than the hidden corporate disagreements that we never hear about — and much less than the public accomplishments that we almost never hear about. The 3.15 merge window, ongoing while this debate was happening, has seen (as of this writing) the merging of well over 10,000 changesets from 1100 developers, most of whom are working together smoothly. But none of the press accounts mentioned that.
That's just life in the free software world. Or almost anywhere else, for
that matter; where there are people, there will be misunderstandings,
blowups, and the occasional failure to immediately recognize a problem.
Somehow, we manage to muddle through anyway and create lots of high-quality
free software. But that is so normal and mundane that it doesn't qualify
for consideration as news.
(Log in to post comments)
Much ado about debugging
Posted Apr 8, 2014 19:16 UTC (Tue) by yann.morin.1998 (guest, #54333) [Link]
Yann E. MORIN.
Much ado about debugging
Posted Apr 8, 2014 19:34 UTC (Tue) by smoogen (subscriber, #97) [Link]
Much ado about debugging
Posted Apr 8, 2014 20:14 UTC (Tue) by danpb (subscriber, #4831) [Link]
My more cynical take on it is that most of the press is simply more interested in have hugely sensational headlines, to drive traffic to their sites & thus generate more advertising revenue for themselves, regardless of article accuracy. This is why I pretty much only read LWN for technical news - no sensational click-bait nonsense here :-)
Much ado about debugging
Posted Apr 9, 2014 7:47 UTC (Wed) by jezuch (subscriber, #52988) [Link]
My impression as well.
> This is why I pretty much only read LWN for technical news - no sensational click-bait nonsense here :-)
Yep. I saw the sensational headlines and thought "if this is really relevant, LWN will be covering this". So I skipped all the "reporting" and waited patiently for our esteemed editor to pick it up instead :)
Much ado about debugging
Posted Apr 10, 2014 4:42 UTC (Thu) by eean (guest, #50420) [Link]
That's giving the journalists way too much credit.
The most cynical I will go is 'It is difficult to get a man to understand something, when his salary depends on his not understanding it.'
Much ado about debugging
Posted Apr 10, 2014 12:39 UTC (Thu) by dag- (guest, #30207) [Link]
It must be offensive to anyone doing *real* journalism.
Much ado about debugging
Posted Apr 10, 2014 17:25 UTC (Thu) by emunson (subscriber, #44357) [Link]
Much ado about debugging
Posted Apr 12, 2014 20:28 UTC (Sat) by speedster1 (guest, #8143) [Link]
(lovely comment, that really cracked me up)
Much ado about debugging
Posted Apr 8, 2014 21:00 UTC (Tue) by marcH (subscriber, #57642) [Link]
> Proprietary software tends not to be followed by stories like this because the inevitable politics, profanity, and chair-throwing are kept behind closed doors and firewalls. We, instead, do most of it in the open.
Don't forget most Linux contributors are employed by $BIGCORP. Double the pleasure?
> "Generic terms are generic, not the first user owns them", NOTABUG
In my experience such contempt for users/customers, gross lack of civility and "prima donna" behaviour tend to be less tolerated at $BIGCORP (especially not in writing) and can be escalated if/when they happen. No matter who is technically right or wrong.
> But that is so normal and mundane that it doesn't qualify for consideration as news.
Trains arriving on time...
Much ado about debugging
Posted Apr 8, 2014 22:20 UTC (Tue) by Karellen (subscriber, #67644) [Link]
One thing that has not been implemented yet is rate-limiting in... bugzilla.
+1 Insightful.
Every bug tracker should have a cooldown period, let's say a minimum of a day between any two state transitions.
Also, between comments. More than 10 comments total in an hour, or 3 comments by any single user in the same period? Everyone gets limited to, I dunno, one comment every 12 hours, except users created after the bug was created who don't get to comment at all, for 48 hours.
Much ado about debugging
Posted Apr 9, 2014 0:00 UTC (Wed) by samlh (subscriber, #56788) [Link]
Much ado about debugging
Posted Apr 9, 2014 7:22 UTC (Wed) by Karellen (subscriber, #67644) [Link]
Could you give a bare-bones outline of what social solution you have in mind to prevent hordes of /.ers/Redditors/HNers/LWNers/Phoronixers/etc... spamming the recently linked, instantly-polarising, rage-inducing bug of the day, please? I'd love to start to get to work on that, because it's been a problem for *ages*!
Much ado about debugging
Posted Apr 9, 2014 7:53 UTC (Wed) by blackwood (guest, #44174) [Link]
That gets the distraction out of the way and allows me to actually work on the bug. Currently I do this with filters, but that has the downside that I still accidentally stumble over poison when crawling through bugzilla, e.g. for an outstanding regression list review.
Much ado about debugging
Posted Apr 10, 2014 2:37 UTC (Thu) by tterribe (guest, #66972) [Link]
No poisonous color, but you can't have everything.
Much ado about debugging
Posted Apr 10, 2014 6:12 UTC (Thu) by blackwood (guest, #44174) [Link]
Much ado about debugging
Posted Apr 10, 2014 6:55 UTC (Thu) by marcH (subscriber, #57642) [Link]
Is this any different from "unsubscribe"?
A bit rough; does not solve the problem of spam polluting valuable comments.
Much ado about debugging
Posted Apr 10, 2014 8:59 UTC (Thu) by marcH (subscriber, #57642) [Link]
Most bugs I read I found them with Google.
Much ado about debugging
Posted Apr 9, 2014 16:25 UTC (Wed) by raven667 (subscriber, #5198) [Link]
Much ado about debugging
Posted Apr 9, 2014 22:20 UTC (Wed) by marcH (subscriber, #57642) [Link]
No, it does not. A rate-limiter in a bug tracker would have nothing to do with writing, running or pushing code.
> The correct answer is for people to behave better.
Thanks for a good laugh.
> Don't look for technical solutions to social problems.
Sure, and let's also give everyone permission to push to any repo, why not? As long as everyone is told to behave, make sure a code review happened and tests have been run, then what could possibly go wrong?
Much ado about debugging
Posted Apr 9, 2014 23:09 UTC (Wed) by marcH (subscriber, #57642) [Link]
> Don't look for technical solutions to social problems.
Oh and we should also let any host send as much SMTP mail as it likes to any other host. As long everyone behaves responsibly everything should be fine.
Now I even started to think of all the money we could save if we did not have any laws... imagine that, no lawyers!
Starring vs commenting
Posted Apr 10, 2014 8:54 UTC (Thu) by alex (subscriber, #1355) [Link]
Starring vs commenting
Posted Apr 10, 2014 13:54 UTC (Thu) by ABCD (subscriber, #53650) [Link]
Much ado about debugging
Posted Apr 9, 2014 1:38 UTC (Wed) by PaulWay (guest, #45600) [Link]
I don't know, I've seen plenty of contempt for users, customers, and fellow sysadmins and developers in the corporate world. I've also seen subtle insults, stonewalling, the Not Invented Here syndrome, backstabbing, active sabotage and undermining, the Works For Me syndrome, abuse of network privileges to attack others, and plenty more. Even in writing. Even when the entire rest of the team is standing there watching the lead developer lead baseless, ad hominem attacks on someone foolish enough to question them.
If anything, it's worse in private organisations, IMO.
Have fun,
Paul
Much ado about debugging
Posted Apr 9, 2014 6:34 UTC (Wed) by nhippi (guest, #34640) [Link]
> In my experience such contempt for users/customers, gross lack of civility and "prima donna" behaviour tend to be less tolerated at $BIGCORP
Depends on which level of org chart it comes from, sigh.
But this kind of attitude brings bad memories from former glibc upstream.
Much ado about debugging
Posted Apr 11, 2014 21:39 UTC (Fri) by hitmark (guest, #34609) [Link]
Never mind that Systemd already had a bad rep before this spat happened.
Much ado about debugging
Posted Apr 11, 2014 22:58 UTC (Fri) by dlang (guest, #313) [Link]
Much ado about debugging
Posted Apr 11, 2014 23:02 UTC (Fri) by HelloWorld (guest, #56129) [Link]
Much ado about debugging
Posted Apr 11, 2014 23:30 UTC (Fri) by dlang (guest, #313) [Link]
Much ado about debugging
Posted Apr 12, 2014 7:39 UTC (Sat) by marcH (subscriber, #57642) [Link]
Much ado about debugging
Posted Apr 15, 2014 14:58 UTC (Tue) by nix (subscriber, #2304) [Link]
Strangely XFree86 appears entirely dead now: no commits for years. But I'm sure these facts are not in any way connected.
Much ado about debugging
Posted Apr 15, 2014 15:40 UTC (Tue) by pizza (subscriber, #46) [Link]
XFree86 alienated the majority of their active developers, who forked the codebase into what became Xorg. The generally more responsive attitude (and much greater productivity) of the forkers led to distros abandoning XFree86 in favor of the fork, and the end-users really didn't care one way or another as things continued to JustWork, albeit with more shiny features.
...This is not unlike what happened with OpenOffice, incidentally.
It's also not comparable to GNOME, because the "forks" (eg MATE and Cinnamon) rely heavily on the ongoing work/libraries/framework of the GNOME3 developers, and there's a fair amount of cross-pollination going on.
Much ado about debugging
Posted Apr 9, 2014 1:46 UTC (Wed) by ras (subscriber, #33059) [Link]
> Shockingly, the move to linux-kernel did little to calm the conversation.
It made my day.
Much ado about debugging
Posted Apr 9, 2014 7:20 UTC (Wed) by rvfh (guest, #31018) [Link]
Developers for both projects are slow to acknowledge that they could be doing things better
Note that Linus, in the answer you link, does not pretend the problem is elsewhere. AIUI he is simply saying that Kay should try and fix it wherever it lies rather than pushing back.
...but I may have misunderstood his statement.
Much ado about debugging
Posted Apr 9, 2014 10:26 UTC (Wed) by tomegun (guest, #56697) [Link]
I think a subtle point (well, relatively subtle compared to the rest of the debate at least) is still being overlooked, so let me add my two cents:
After all has been said and done, it seems the criticism against the systemd side in all of this boils down to the reaction to Borislav's bug report.
However, it seems people are still overlooking the fact that this was a feature request, and not a bug report about an assert(). The assert was merely used as an example about past problems, and it had indeed been fixed a long time ago (which is why the rest of the discussion on the bug report does not touch on that at all). So when Kay says "that is the expected behavior", he does not mean that hitting asserts in systemd is expected (that is a serious bug), he is saying that if systemd (or the kernel) has serious bugs, the fact that the logs get flooded is currently the expected behavior (rate-limiting would change that).
To sum up, the feature request was: "Change the meaning of 'debug'", and the answer was "No. If you disagree, please take the discussion to the mailing list". Compared to the usual discourse on the LKML this seems pretty civil and reasonable to me :)
Much ado about debugging
Posted Apr 9, 2014 14:00 UTC (Wed) by martin.langhoff (guest, #61417) [Link]
The implicit requirement is that it must be possible to completely (if slowly) boot a complete OS to a technically usable desktop. systemd broke this, and Kay initially refused to acknowledge this was a systemd change that broke an important use case.
Once things settled a bit, yes, systemd folks moved to read the global debug flag in a more conservative way so they don't break the OS. That was the fix resulting from this. The kernel folks also movedto rate-limit misbehaving userland programs. That should be a hint or two.
Much ado about debugging
Posted Apr 9, 2014 15:09 UTC (Wed) by tomegun (guest, #56697) [Link]
Either way, this was a serious bug which was fixed from our side long before this whole debate erupted. It is my understanding that now also the kernel side of this is being worked on, so it sounds like we'll get something positive out of this mess (user-space should clearly not be able to give the kernel troubles in this way).
If people are still experiencing problems like this, bug reports should be filed, and we should figure out what is causing the problem. The proposed solution of not reading 'debug' from systemd is of course not a solution at all, as it would just make it a bit harder to trigger the problem. Kay's recent change to stop writing to kmesg earlier, also does not solve any underlying problem (but at least that change made sense in isolation).
Much ado about debugging
Posted Apr 10, 2014 18:57 UTC (Thu) by tytso (subscriber, #9993) [Link]
The bug may have been fixed in systemd's git tree, but it hadn't yet been fixed in the kernel developers' Fedora system. Now, that may not have out of scope of Fedora's Bugzilla, but there was also a bugzilla entry opened in Fedora's BZ, which was also ignored for weeks on end. I'm not sure whose responsibility it was to make sure a fix that was causing this much pain would get backported into Fedora's systemd package, but one of the reasons why things escalated so badly is because of faults on both sides.
On the kernel side, the original poster could have done a better job making it clear that he was still suffering a problem, and to make it clear that he was looking for a fix to his pain, and then offer one possible solution. The OP focused on the solution, and not the specific problem he was seeing, and the systemd side only reacted to the proposed solution, and not for the underlying problem.
One of the things I've learned is that when you groom a bug tracker, if you get a proposed solution that doesn't make sense, instead of just NACK'ing the fix and closing it, is to ask the question why are you proposing this, and then try to fix the underlying problem for the user.
There are times when I've said, "look, this problem was fixed in e2fsprogs 1.42.9; you need to complain to Ubuntu get them to update to a newer e2fsprogs package". But the point is that I tell them that it's fixed, and how they can get the fix --- either by bugging their distribution, or compiling their own version of the fixed software.
Are the systemd developers obligated to do this? Of course not. But things work a whole lot more smoothly if they were to take a bit more of a user-friendly attitude towards their users. That would help a lot when good will and cooperation is required across open source project boundaries. The attitude of, "this is all I am obligated to do, and I won't do an iota beyond that towards making my users' life easier", is in the end highly self-defeating.
Much ado about debugging
Posted Apr 10, 2014 21:51 UTC (Thu) by tomegun (guest, #56697) [Link]
Now that the situation has been clarified, I hope we can all agree that this was pretty much a non-event that was blown way out of proportion.
Much ado about debugging
Posted May 8, 2014 1:14 UTC (Thu) by fest3er (guest, #60379) [Link]
Much ado about debugging
Posted Apr 9, 2014 15:28 UTC (Wed) by johannbg (guest, #65743) [Link]
Throwing a tantrum, calling contributors primadonna and what not for, threaten to block contribution for their own shortcomings in their own process is rather interesting to say the least.
Much ado about debugging
Posted Apr 9, 2014 16:00 UTC (Wed) by ken (subscriber, #625) [Link]
I cant wait until some other program starts to parse cmdline to turn on debug.
Much ado about debugging
Posted Apr 9, 2014 16:28 UTC (Wed) by johannbg (guest, #65743) [Link]
Much ado about debugging
Posted Apr 9, 2014 16:43 UTC (Wed) by ovitters (guest, #27950) [Link]
Suggest reading the article, your version is inaccurate.
Much ado about debugging
Posted Apr 9, 2014 22:51 UTC (Wed) by marcH (subscriber, #57642) [Link]
What is always perceived as very aggressive is closing a bug only an hour after it was open and before there was any chance of any kind of discussion to happen. This is everything but civil and rightly perceived as an insult by the reporter who spent at best hours and at worst days root causing and reporting an issue. Who is right or wrong, whether the bug is a feature request or not, whether it should be closed later or not... none of this really matters here; this is only about showing basic respect for the hard work of non-expert users willing to contribute.
I've seen bugs closed this quick more than once (we all have) and every single time I could chat with the people who did it, it always appeared that they indeed meant to tell the bug submitter to "PFO".
If this bug had been closed the same way but at least a few days later then I bet this story would not have spiralled out that bad and it would not have made the news.
Thus the simple rate-limiting idea. It would very simply force everyone to appear respectful, even the people who are not.
Much ado about debugging
Posted Apr 9, 2014 23:40 UTC (Wed) by dilinger (subscriber, #2867) [Link]
When I get involved in a project, typically the first thing I do (after having used the software) is to report some bugs. If my feedback/contributions are welcome, I get more and more involved. If it isn't welcome, then I either go back to being Just A User, or even stop using the software altogether.
Making developers appear respectful might cause me to waste more of my time on a project, only to eventually discover that my contributions are not welcome.
Much ado about debugging
Posted Apr 10, 2014 15:29 UTC (Thu) by ebassi (subscriber, #54855) [Link]
If this bug had been closed the same way but at least a few days later then I bet this story would not have spiralled out that bad and it would not have made the news.
the problem is that if you start leaving bugs open "for a few days" then any bug tracking system (not just Bugzilla) starts spiralling out of control real fast.
you need to aggressively manage your bug tracking system, all the time. bugs cannot stay open just to avoid drama, or to not hurt the feelings of the reporter.
this does not mean that you should not be considerate when replying to the reporter, and point them in the right direction in an helpful manner. leaving bugs open for a day just because, though, is being irresponsible as a maintainer, and comes to bite you in the ass in the long run.
Much ado about debugging
Posted Apr 10, 2014 21:38 UTC (Thu) by marcH (subscriber, #57642) [Link]
Yes, aggressive behaviour is indeed the issue here. Together with the belief that you know so much better, and that it's super urgent to close the door to doubt, thought and listening to others.
Much ado about debugging
Posted Apr 9, 2014 20:27 UTC (Wed) by _xhr_ (guest, #92665) [Link]
Awesome article. +1 for the calm description of things[TM]!
Much ado about debugging
Posted Apr 9, 2014 23:55 UTC (Wed) by nix (subscriber, #2304) [Link]