|
|
Subscribe / Log in / New account

Atomic context and kernel API design

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jonathan Corbet
March 25, 2008
An API should refrain from making promises that it cannot keep. A recent episode involving the kernel's in_atomic() macro demonstrates how things can go wrong when a function does not really do what it appears to do. It is also a good excuse to look at an under-documented (but fundamental) aspect of kernel code design.

Kernel code generally runs in one of two fundamental contexts. Process context reigns when the kernel is running directly on behalf of a (usually) user-space process; the code which implements system calls is one example. When the kernel is running in process context, it is allowed to go to sleep if necessary. But when the kernel is running in atomic context, things like sleeping are not allowed. Code which handles hardware and software interrupts is one obvious example of atomic context.

There is more to it than that, though: any kernel function moves into atomic context the moment it acquires a spinlock. Given the way spinlocks are implemented, going to sleep while holding one would be a fatal error; if some other kernel function tried to acquire the same lock, the system would almost certainly deadlock forever.

"Deadlocking forever" tends not to appear on users' wishlists for the kernel, so the kernel developers go out of their way to avoid that situation. To that end, code which is running in atomic context carefully follows a number of rules, including (1) no access to user space, and, crucially, (2) no sleeping. Problems can result, though, when a particular kernel function does not know which context it might be invoked in. The classic example is kmalloc() and friends, which take an explicit argument (GFP_KERNEL or GFP_ATOMIC) specifying whether sleeping is possible or not.

The wish to write code which can work optimally in either context is common, though. Some developers, while trying to write such code, may well stumble across the following definitions from <linux/hardirq.h>:

    /*
     * Are we doing bottom half or hardware interrupt processing?
     * Are we in a softirq context? Interrupt context?
     */
    #define in_irq()	   (hardirq_count())
    #define in_softirq()   (softirq_count())
    #define in_interrupt() (irq_count())

    #define in_atomic()	   ((preempt_count() & ~PREEMPT_ACTIVE) != 0)

It would seem that in_atomic() would fit the bill for any developer trying to decide whether a given bit of code needs to act in an atomic manner at any specific time. A quick grep through the kernel sources shows that, in fact, in_atomic() has been used in quite a few different places for just that purpose. There is only one problem: those uses are almost certainly all wrong.

The in_atomic() macro works by checking whether preemption is disabled, which seems like the right thing to do. Handlers for events like hardware interrupts will disable preemption, but so will the acquisition of a spinlock. So this test appears to catch all of the cases where sleeping would be a bad idea. Certainly a number of people who have looked at this macro have come to that conclusion.

But if preemption has not been configured into the kernel in the first place, the kernel does not raise the "preemption count" when spinlocks are acquired. So, in this situation (which is common - many distributors still do not enable preemption in their kernels), in_atomic() has no way to know if the calling code holds any spinlocks or not. So it will return zero (indicating process context) even when spinlocks are held. And that could lead to kernel code thinking that it is running in process context (and acting accordingly) when, in fact, it is not.

Given this problem, one might well wonder why the function exists in the first place, why people are using it, and what developers can really do to get a handle on whether they can sleep or not. Andrew Morton answered the first question in a relatively cryptic way:

in_atomic() is for core kernel use only. Because in special circumstances (ie: kmap_atomic()) we run inc_preempt_count() even on non-preemptible kernels to tell the per-arch fault handler that it was invoked by copy_*_user() inside kmap_atomic(), and it must fail.

In other words, in_atomic() works in a specific low-level situation, but it was never meant to be used in a wider context. Its placement in hardirq.h next to macros which can be used elsewhere was, thus, almost certainly a mistake. As Alan Stern pointed out, the fact that Linux Device Drivers recommends the use of in_atomic() will not have helped the situation. Your editor recommends that the authors of that book be immediately sacked.

Once these mistakes are cleared up, there is still the question of just how kernel code should decide whether it is running in an atomic context or not. The real answer is that it just can't do that. Quoting Andrew Morton again:

The consistent pattern we use in the kernel is that callers keep track of whether they are running in a schedulable context and, if necessary, they will inform callees about that. Callees don't work it out for themselves.

This pattern is consistent through the kernel - once again, the GFP_ flags example stands out in this regard. But it's also clear that this practice has not been documented to the point that kernel developers understand that things should be done this way. Consider this recent posting from Rusty Russell, who understands these issues better than most:

This flag indicates what the allocator should do when no memory is immediately available: should it wait (sleep) while memory is freed or swapped out (GFP_KERNEL), or should it return NULL immediately (GFP_ATOMIC). And this flag is entirely redundant: kmalloc() itself can figure out whether it is able to sleep or not.

In fact, kmalloc() cannot figure out on its own whether sleeping is allowable or not. It has to be told by the caller. This rule is unlikely to change, so expect a series of in_atomic() removal patches starting with 2.6.26. Once that work is done, the in_atomic() macro can be moved to a safer place where it will not create further confusion.

Index entries for this article
Kernelin_atomic()
KernelSpinlocks


(Log in to post comments)

Atomic context and kernel API design

Posted Mar 25, 2008 18:00 UTC (Tue) by jzbiciak (guest, #5246) [Link]

Hmm... if in_atomic only has one or two narrow, valid uses, perhaps it shouldn't be a macro at
all?  Either a small, static function in the one source file that uses it?  GCC will still
inline it, but the static declaration would send the signal "not for outside consumption."

Atomic context and kernel API design

Posted Mar 25, 2008 19:04 UTC (Tue) by epa (subscriber, #39769) [Link]

Indeed, you might ask why use macros at all for this kind of thing, when compiler-inlined
functions are just as efficient?

Atomic context and kernel API design

Posted Mar 25, 2008 19:40 UTC (Tue) by IkeTo (subscriber, #2122) [Link]

Perhaps that "one or two narrow, valid uses" are in different files?

Atomic context and kernel API design

Posted Mar 25, 2008 19:59 UTC (Tue) by jzbiciak (guest, #5246) [Link]

Hmmm... it looks like something to be called by a "per-arch" files, so even "one use" actually
lives in multiple files. :-)

I guess it belongs in a header, but I still wonder if it shouldn't have a different name.

Atomic context and kernel API design

Posted Mar 25, 2008 22:51 UTC (Tue) by jd (subscriber, #26381) [Link]

Perhaps instead of changing the name, change the macro so that it means what it says at all levels. Different calling files set a different value to some symbol, which indicates to the header which type of atomic context you're thinking about.

Then all you need do is ensure that (a) there's a way to know whether the context is atomic or not, and (b) the same method is used across that file and across any includes that may be brought in by that file.

Alternatively, you could have two parallel kernels, one always atomic, the other always not. Then you'd never need to test at all. Although you would have the communications problem from hell that all parallel solutions suffer from.

Atomic context and kernel API design

Posted Mar 25, 2008 23:54 UTC (Tue) by jzbiciak (guest, #5246) [Link]

If you follow Andrew Morton's reasoning, "am I in atomic" is something a caller should never
ask.  It should be told explicitly, as in the case of kmalloc().

The case where "in_atomic" is getting used "legitimately" is in a fault handler (which should
not be a fastpath pretty much by definition).  It sounds like they're abusing preempt_count to
coax a particular behavior out of the fault handler, rather than just stating the intended
behavior directly.  That doesn't necessarily sound like clean design to me, but rather an
overly clever hack.

I'm sure someone more familiar with this mechanism can explain why it is or is not a good
mechanism.

Own kernel

Posted Mar 25, 2008 19:48 UTC (Tue) by ncm (guest, #165) [Link]

I guess this argues for building your kernels with pre-emption enabled, at least until 2.4.26.

Own kernel

Posted Mar 25, 2008 21:58 UTC (Tue) by cventers (guest, #31465) [Link]

2.4.26 has been out for a while :p

Atomic context and kernel API design

Posted Mar 25, 2008 20:10 UTC (Tue) by vmole (guest, #111) [Link]

Your editor recommends that the authors of that book be immediately sacked.

I think they should just re-write it in an entirely different style at great expense and at the last minute.

Atomic context and kernel API design

Posted Mar 25, 2008 20:37 UTC (Tue) by dlang (guest, #313) [Link]

what you are missing is that I remember correctly our editor is one of the authors.

Atomic context and kernel API design

Posted Mar 25, 2008 20:46 UTC (Tue) by vmole (guest, #111) [Link]

Uh, no. What you missed is the Python reference.

Atomic context and kernel API design

Posted Mar 25, 2008 21:40 UTC (Tue) by pr1268 (subscriber, #24648) [Link]

I think you both are correct - IIRC our editor was/is an author and there's a Monty Python reference.

I must say, I do admire our editor's integrity in this matter.

Atomic context and kernel API design

Posted Mar 25, 2008 23:52 UTC (Tue) by vomlehn (guest, #45588) [Link]

There are many cases where admitting you are wrong and suggesting a punishment preempts
someone else suggesting a more suitable punishment. I'm suggesting flogging is a suitable
punishment. Well, not really, I was just caught up in the entertainment value of a good
flogging. 

In reality, writing is a severe form of self-flagellation and I am grateful our editor
subjected himself to it. I first bought the Second Edition of Lunux Device Drivers, bought the
Third Edition as soon as I saw it was out, and will buy the Fourth Edition as soon as I learn
of its existence.

Atomic context and kernel API design

Posted Mar 25, 2008 20:42 UTC (Tue) by allesfresser (guest, #216) [Link]

Bring in the wonder llama consultants.

Atomic context and kernel API design

Posted Mar 25, 2008 21:33 UTC (Tue) by nix (subscriber, #2304) [Link]

Paging Jeff Minter...

Atomic context and kernel API design

Posted Apr 2, 2008 0:35 UTC (Wed) by roelofs (guest, #2599) [Link]

Please don't forget the majestik møøse...

Atomic context and kernel API design

Posted Mar 26, 2008 3:24 UTC (Wed) by kjp (guest, #39639) [Link]

What the hell does GFP stand for?

What is the meaning of GFP

Posted Mar 26, 2008 5:42 UTC (Wed) by pr1268 (subscriber, #24648) [Link]

My guess is "Get Free Page" (with various qualifier suffixes, i.e. GFP_KERNEL, GFP_ATOMIC, GFP_DMA, etc.).

It's strange that, after nearly an hour of grepping files throughout the entire Linux source tree, I was unable to come up with a meaning. My search took me all over the mm/, include/linux/, Documentation/, and kernel/ directories, but to no avail. The meaning of GFP_KERNEL via a Google search (going five pages deep) was equally elusive!

I suppose there are many kernel developers out there who know its true meaning whilst snickering over my frustrated search...

What is the meaning of GFP

Posted Mar 26, 2008 8:55 UTC (Wed) by dale77 (guest, #1490) [Link]

http://lwn.net/Articles/23042/

Oops. But that is the guy who wrote about that in_atomic ref. Perhaps we need another opinion
;-)

Just kidding. Keep up the good work corbet.

Get Free Page

Posted Mar 26, 2008 10:02 UTC (Wed) by pr1268 (subscriber, #24648) [Link]

I was right! And to think I extrapolated "get free page" from the context of where I found a bunch of GFP_* macros in my search. Thank you, Dale77.

For those with further interest, the GFP_* macros appear to be a set of bitmasks, many of which are defined as bitwise-ORed versions of others (ref. <Linux source>/include/kernel/gfp.h). These are used to set parameters to allocating virtual memory (and what to do in case of failure). But, I'm certain a lot of LWN readers already know this.

I have a lot to learn about the internal workings of the Linux kernel, as well as my online search skills--I'm certain a search here at LWN would have led me to that link.

Get Free Page

Posted Mar 26, 2008 12:08 UTC (Wed) by jzbiciak (guest, #5246) [Link]

Google's "site:" feature is very handy. For example, to search just LWN, put "site:lwn.net" at the start of your search string, like this.

What is the meaning of GFP

Posted Mar 31, 2008 7:43 UTC (Mon) by jengelh (subscriber, #33263) [Link]

Well on Windows you get General Protection Faults, in Linux it must therefore be General Fault
Protection :-)

Atomic context and kernel API design

Posted Mar 31, 2008 7:41 UTC (Mon) by jengelh (subscriber, #33263) [Link]

Atomic context and kernel API design
By Jonathan Corbet
March 25, 2008
[...]
Your editor recommends that the authors of that book be immediately sacked.

Now who are the authors of that book... :-)

kmalloc cannot figure out whether sleeping is allowable or not

Posted Apr 1, 2008 0:02 UTC (Tue) by rusty (guest, #26) [Link]

> In fact, kmalloc() cannot figure out on its own whether sleeping is
> allowable or not...

You are (of course) correct.  Damn, but it was such a beautiful example; 
the same one I used in 2003 in my OLS keynote, and noone spotted it then 
either.

Thanks,
Rusty.

kmalloc cannot figure out whether sleeping is allowable or not

Posted Apr 2, 2008 0:54 UTC (Wed) by roelofs (guest, #2599) [Link]

Damn, but it was such a beautiful example; the same one I used in 2003 in my OLS keynote, and noone spotted it then either.

At least you're in good company. ;-)

A somewhat analogous "discovery" involved the thread-(non)safety of the double-checked locking pattern in C++, which various people (possibly including Scott Effective C++ Meyers himself) espoused for some years prior to the publication of an article describing its problems [PDF].

Greg

Atomic context and kernel API design

Posted Apr 7, 2008 7:30 UTC (Mon) by AnotherAnon (guest, #51448) [Link]

Couldn't this be problematic in the case of systems that use intrusion detection? One
technique that I am familar with is the use of MD5 sums of system binaries, to check if they
have been tampered with. Directly modifying binaries would make this prefetch technique
incompatible with systems that do MD5 checking.

Atomic context and kernel API design

Posted Jul 23, 2014 23:28 UTC (Wed) by xiay (guest, #98011) [Link]

Why not just make spin_lock() increase preempt_count nevertheless?


Copyright © 2008, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds