|
|
Subscribe / Log in / New account

GCC 4.3.0 exposes a kernel bug

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jake Edge
March 7, 2008

A change to GCC for a recent release coupled with a kernel bug has created a messy situation, with possible security implications. GCC changed some assumptions about x86 processor flags, in accordance with the ABI standard, that can lead to memory corruption for programs built with GCC 4.3.0. No one has come up with a way to exploit the flaw, at least yet, but it clearly is a problem that needs to be addressed.

The problem revolves around the x86 direction flag (DF), which governs whether block memory operations operate forward through memory or backwards. The main use for the flag is to support overlapping memory copies, where working backwards through memory may be required so that the data being copied does not get overwritten as the copy progresses. Debian hacker Aurélien Jarno reported the problem to linux-kernel on March 5th, which was found when building Steel Bank Common Lisp (SBCL) using the new compiler.

GCC's most recent release, 4.3.0, assumes that the direction flag has been cleared (i.e. memory operations go in a forward direction) at the entry of each function, as is specified by the ABI (which is, somewhat amusingly, found at sco.com [PDF]). Unfortunately, this clashes with Linux signal handlers, which get called, incorrectly, with the flag in whatever state it was in when the signal occurred. This has the effect of leaking one bit of state from the user space process that was running when the signal occurred to the signal handler, which could be in another process.

That, in itself, is a bug, seemingly with fairly minimal impact. Prior to 4.3, GCC would emit a cld (clear direction flag) opcode before doing inline string or memory operations, so those operations would start from a known state. In 4.3, GCC relies on the ABI mandate that the direction flag is cleared before entry to a function, which means that the kernel needs to arrange that before calling a signal handler. It currently doesn't, but a small patch fixes that.

The window of vulnerability is small, but was observed in SBCL. The sequence of events that would lead to memory corruption are as follows:

  • a user space program does an operation (memmove() for example) that sets DF
  • a signal occurs for some process
  • the kernel calls the signal handler
  • the signal handler does a memmove() in what it thinks is a forward direction
  • the memory is copied in the reverse direction, leading to corruption
It is hard to see how that could be turned into a security breach, but it would be a mistake to assume that it can't. Other kernel bugs, like the one that allowed the recent vmsplice() exploit, have looked liked memory corruption, but were found to be more than that. The DF issue may turn out to be harmless from a security standpoint, but it should not be assumed.

So, now the question is: what to do about it. It is clear that the kernel should not leak the DF state to signal handlers, regardless of what GCC does. It is interesting to note that this behavior is the same (DF is not cleared on entry to a signal handler) on BSD kernels, leading some to claim that it is the ABI that is incorrect and that GCC should revert to its old behavior. Solaris kernels do clear the DF before calling signal handlers. This problem has existed for 15 years; GCC has always emitted code that worked correctly on kernels that did not follow the ABI, until now.

Part of the problem is that there are an enormous number of installed kernels that are vulnerable to this problem, but only if GCC 4.3 is installed. That version of GCC is not, yet, in widespread use, so the thinking is that GCC should revert its behavior now, before it gets into distributions. As kernels with the fix become more widespread, the "proper" behavior could be restored. The GCC folks don't necessarily see it that way, so it is unclear what will happen.

While it is true that distributors can control what kernel version and GCC version they ship, those aren't the only ways that either GCC or GCC-compiled binaries get installed. It is a bit of ticking time bomb for random memory corruption at a minimum. Handling those bug reports will be very difficult and time consuming. While the new behavior of GCC is correct, and the kernel is broken, it would be very helpful to back out this change, perhaps providing the new behavior via a command-line argument for those who are sure their binaries will be running on patched kernels. Some discussion on the gcc-devel list would indicate that a GCC 4.3.0.1 or 4.3.1 may be forthcoming.


Index entries for this article
KernelGCC
KernelSignal handling


(Log in to post comments)

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:00 UTC (Fri) by flewellyn (subscriber, #5047) [Link]

Er, I think the new behavior can be called "correct" only for a very narrow definition of the term. It's true that the ABI does specify that DF should be cleared, but because it's widely known (or should be) among system-level hackers that many important programs do not honor this part of the ABI, GCC should ensure correct behavior in cases where the callers do not.

Okay, so the Linux and BSD kernels were not properly following this part of the ABI, and that's technically incorrect. But the thing is, the GCC developers apparently knew that, and prior to this had fixed it; adherence to standards and specifications is a good thing, but so is not breaking working code. Hopefully they'll revert this change, perhaps with a compile-time warning for the incorrect behavior.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:23 UTC (Fri) by zlynx (guest, #2285) [Link]

I don't see why they should revert it.  Maybe make a flag to change the default behavior.

But people are going to have to live with it.  Apparantly ICC has been doing this for years
already.  So the problem is already out there on anything ICC has compiled.  Like, say,
commercial versions of MySQL.  Don't those use ICC?  And I think some Linux games were built
with ICC.  And what about Oracle?

I think that having the kernel do the wrong thing and then claiming that GCC has to fix the
problem is just ridiculous.  Especially after all the years of trying to make GCC follow the
ABI standards so that it can interoperate with other compilers and libraries.

Follow the standards, don't make up your own.  That's Microsoft all over, and we open source
types are supposed to hate it.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:37 UTC (Fri) by flewellyn (subscriber, #5047) [Link]

There's following the standards, and then there's not communicating about a potentially code-breaking change. Because GCC previously was emitting cld instructions before every inlined function call, we can conclude they knew about the problem existing in the wild. They should have given the Linux and BSD developers a heads up about this.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:55 UTC (Fri) by daney (guest, #24551) [Link]

GCC changes every day.  If you are interested in what every change is, you are free to look
at: http://gcc.gnu.org/ml/gcc-cvs/

This particular change to GCC was made many months ago and underwent extensive testing on many
different platforms.  That this bug existed and was exposed only after 4.3.0 was released is
perhaps unfortunate, but you imply that someone knew about the problem and withheld that
information.  That is not the case.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:27 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

From the gcc point of view, it was simply a matter of observing that the compiler was emitting an unneeded instruction: we don't have to clear that register because it's already cleared; the standard says so, and real implementations follow the standard correctly (or so it was thought). The result was that code sequences that use the x86 string instructions are slightly smaller and faster with gcc 4.3.0.

The issue of kernels not following the rules in the case of signal handlers was not noticed until the 4.3.0 release process had already started.

If you think that kernel developers should be notified of every change of this kind, just in case it does something, they'd need to subscribe to the svn commit mailing list, and they'd be overwhelmed with messages describing small changes.

GCC 4.3.0 exposes a kernel bug

Posted Mar 8, 2008 2:55 UTC (Sat) by dlang (guest, #313) [Link]

this is a case where the history of the code is needed to tell what's really going on.

did older GCC versions add the instruction because some programmer in the past ran into this
bug and fixed it (in which case the changelog for the commit that introduced this would
theoretically be found), or was the original programmer of this function in GCC exercising
defensive programming by not assuming that other programs leave things in any particular state
(which is what was assumed)?

how large and how many clock cycles does this instruction use? 

GCC 4.3.0 exposes a kernel bug

Posted Mar 8, 2008 19:21 UTC (Sat) by ibukanov (subscriber, #3942) [Link]

History may not be relevant here. It could be that in the past GCC was simply not able to
track the state of the control bit when generation the code. As such the compiler had to
insert the explicit instructions to reset the bit even if it was known that they were not
necessary from ABI point of view.

GCC 4.3.0 exposes a kernel bug

Posted Mar 9, 2008 3:37 UTC (Sun) by dlang (guest, #313) [Link]

the history is very relevant. you are listing a third option (very similar to the second one I
listed above) knowing which of these is correct (or if there is a fourth that is correct) is
significant in evaluating what needs to change.

Was it worth the trouble?

Posted Mar 8, 2008 22:46 UTC (Sat) by eru (subscriber, #2753) [Link]

The result was that code sequences that use the x86 string instructions are slightly smaller and faster with gcc 4.3.0.

The eliminated instruction is one byte long, executes very quickly, and string instructions are not very common in most real code anyway. When they occur, they are heavyweight operations, because the sources and count have to be set up into particular registers, and the string instruction itself usually takes much more time than simple instructions. Whether or not the direction flag instruction appears might then change the time of the string operation by perhaps 1% or less. So except for contrived programs that consist almost entirely of these string operations, I suspect it is impossible to measure any execution time reduction in actual programs that could be attributed to this compiler change.

Of course removing a redundant instruction is aesthetically the right thing to do, but in this case I think it does not have practical benefits.

Was it worth the trouble?

Posted Mar 9, 2008 20:07 UTC (Sun) by vonbrand (guest, #4458) [Link]

Was in worth the trouble fixing this in the kernel?
Definitely. The kernel must do "the right thing", without regard to any idiocy commited by the programs it is running. Not doing so might open vulnerabilities.
Was it really worth it in GCC?
Not so sure... but the compiler should enforce the relevant ABIs (and is also entitled to assume they are being followed).

Was it worth the trouble?

Posted Mar 19, 2008 9:38 UTC (Wed) by pharm (guest, #22305) [Link]

<i>The eliminated instruction is one byte long, executes very quickly, and string instructions
are not very common in most real code anyway. When they occur, they are heavyweight
operations, because the sources and count have to be set up into particular registers, and the
string instruction itself usually takes much more time than simple instructions. Whether or
not the direction flag instruction appears might then change the time of the string operation
by perhaps 1% or less. So except for contrived programs that consist almost entirely of these
string operations, I suspect it is impossible to measure any execution time reduction in
actual programs that could be attributed to this compiler change.</i>

Unfortunately, you're wrong. CLD can have a latency of 50+ cycles on some x86 implementations:
that's not an insignificant amount. Plus we're not just talking about "string operations",
we're talking about functions like memset() & memcpy() too, which often use them.

See: http://gcc.gnu.org/ml/gcc/2008-03/msg00360.html for some benchmarks
and http://gcc.gnu.org/ml/gcc/2008-03/msg00404.html for a link to a document which gives a
latency of 52 cycles for CLD.

GCC 4.3.0 exposes a kernel bug

Posted Mar 14, 2008 20:52 UTC (Fri) by giraffedata (guest, #1954) [Link]

I don't think what people knew and/or concealed is relevant, but the fact that the behavior has existed for 15 years and exists in countless systems today matters a lot. 15 years of practice is a much stronger standard than any prescriptive document. I say the standard is that DF's value is undefined at entry to a function, and Gcc 4.3.0 fails to conform.

This is a classic dilemma. You can make Gcc right or you can make it work.

If you offered both versions to the public, very few would opt for the "right" one. That's not the last word, of course. I'm sure some people believe the Gcc project has higher goals than giving its users what they want.

But traditionally, prescriptive standards nearly always bow to what actual practice demands.

GCC 4.3.0 exposes a kernel bug

Posted Mar 14, 2008 21:21 UTC (Fri) by zlynx (guest, #2285) [Link]

Claiming that it's standard "because GCC does it that way" completely ignores all the other
compilers that do *not* do it that way.

GCC 4.3.0 exposes a kernel bug

Posted Mar 14, 2008 22:26 UTC (Fri) by giraffedata (guest, #1954) [Link]

Claiming that it's standard "because GCC does it that way" completely ignores all the other compilers that do *not* do it that way.

I think you got it backward. I claim it's standard because Linux does it that way. Linux is what violates the prescribed standard.

I also didn't state the de facto standard as precisely as I could have, because Linux clearly should change to clear the DF flag. But Gcc should continue to clear it too, because old Linux exists.

GCC 4.3.0 exposes a kernel bug

Posted Mar 14, 2008 22:42 UTC (Fri) by zlynx (guest, #2285) [Link]

> But Gcc should continue to clear it too, because old Linux exists.

This does not buy you anything except slowing down all your code unnecessarily.  Any user
might use a binary built with some other compiler, like the precompiled commercial MySQL
server, or a game.  Software running through Wine is probably built with Visual Studio.  A JIT
like Mono or Java might generate code that doesn't reset DF.  A developer might be using TCC
for ultra-fast compiles.  There is also LLVM: I don't know, but it might not do the DF clear
either.

See what I mean about other compilers?  Do you wish to have every one of them also clear DF on
every function?

GCC 4.3.0 exposes a kernel bug

Posted Mar 15, 2008 0:07 UTC (Sat) by nix (subscriber, #2304) [Link]

ICC has apparently never cleared DF. I guess nobody's ever tried compiling 
programs that make heavy use of asynchronous signal handlers with ICC on 
Linux...

GCC 4.3.0 exposes a kernel bug

Posted Mar 15, 2008 2:23 UTC (Sat) by giraffedata (guest, #1954) [Link]

OK, I see your point.

*the* standard

Posted Mar 21, 2008 11:29 UTC (Fri) by gvy (guest, #11981) [Link]

> 15 years of practice is a much stronger standard
> than any prescriptive document.
...over at sco dotcom. :)

Well, IMHO trying to follow standards in a way which creates artifical and hard to debug
problems to the rest of the crowd *is* ignorance too.

GCC 4.3.0 exposes a kernel bug

Posted Mar 24, 2008 12:52 UTC (Mon) by olecom (guest, #42886) [Link]

> It is hard to see how that could be turned into a security breach,
> but it would be a mistake to assume that it can't. Other kernel bugs,
> like the one that allowed the recent vmsplice() exploit, have looked
> liked memory corruption, but were found to be more than that.

| After a bit more poking around, we discovered how to alter the page
| mappings so that sections of kernel and I/O memory were directly mapped
| into all user address spaces.[2]

[2] Talk about security holes!
(C) 1992 http://valhenson.org/synthesis/SynthesisOS/ch7.html

Not checking userspace supplied pointers is most basic security hole in
userspace + kernel memory based systems.
______

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:07 UTC (Fri) by mingo (subscriber, #31122) [Link]


Note, the fix has gone upstream today:

-------------->
commit e40cd10ccff3d9fbffd57b93780bee4b7b9bff51
Author: Aurelien Jarno <aurelien@aurel32.net>
Date:   Wed Mar 5 19:14:24 2008 +0100

    x86: clear DF before calling signal handler
    
    The Linux kernel currently does not clear the direction flag before
    calling a signal handler, whereas the x86/x86-64 ABI requires that.
    
    Linux had this behavior/bug forever, but this becomes a real problem
    with gcc version 4.3, which assumes that the direction flag is
    correctly cleared at the entry of a function.
    
    This patches changes the setup_frame() functions to clear the
    direction before entering the signal handler.
    
    Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: H. Peter Anvin <hpa@zytor.com>

GCC 4.3.0 exposes a kernel bug

Posted Mar 10, 2008 10:17 UTC (Mon) by csamuel (✭ supporter ✭, #2624) [Link]

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:44 UTC (Fri) by daney (guest, #24551) [Link]

> This has the effect of leaking one bit of state from the user space
> process that was running when the signal occurred to the signal
> handler, which could be in another process.

The last time I checked, signal handlers are not in a different process from the one receiving
the signal.  They are run on one of the threads of the process that received the signal.  The
bug is that the state leaks from that specific thread into the signal handler running on the
same thread.


GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 21:57 UTC (Fri) by shahms (guest, #8877) [Link]

There's no requirement that the current executing process be the one that receives the signal.
For example:

Process 1:
  kill(proc2, SIGHUP);

Process 2:
  signal_handler()

Doing some hand-wavy magic about the scheduler running process 2 immediately when process 1
sends the signal.  Process 1 would have been executing and would get interrupted so that
process 2's signal handler could process the HUP.  If process 1 had changed the DF, it would
"leak" into process 2.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:19 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

We were arguing about this on the gcc list, and it was unclear to me whether the DF flag could actually leak from one process to another (indicating a true security bug, however minor). If a context switch occurs, wouldn't the kernel restore all of process 2's registers before entering the signal handler?

Can a kernel expert confirm or deny this? (At least, for currently deployed kernels)?

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:20 UTC (Fri) by zlynx (guest, #2285) [Link]

That scenario requires a context switch, and I believe that the kernel  handled that properly.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:23 UTC (Fri) by daney (guest, #24551) [Link]

The flag bits come from the thread executing the signal handler, not some other place
(process).

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:03 UTC (Fri) by i3839 (guest, #31386) [Link]

> The DF issue may turn out to be harmless from a security standpoint,
> but it should not be assumed.

If so, then why is this subscriber-only content?

Subscriber-only

Posted Mar 7, 2008 22:07 UTC (Fri) by corbet (editor, #1) [Link]

Are you taking the position that any article which talks about security should be automatically free for all?

By the time any distributor is carrying 4.3.x, the article will have long since become free...

Subscriber-only

Posted Mar 7, 2008 22:39 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

Security updates that advise people to install replacement packages, or that contain other urgent information, would reach people via other channels.

Subscriber-only

Posted Mar 8, 2008 13:59 UTC (Sat) by i3839 (guest, #31386) [Link]

No, I'm not. I just thought that 4.3.0 was released, and was already in use by distros, so
that the matter was more urgent than it is. 4.3.0 is the upcoming release and not yet
released, so ignore me.

To clarify, if it's an urgent case I would find it strange if there was a subscribers-only
article with background info, but not a generic article informing people about the problem.
I'm not saying LWN should notify people of every security event, nor open all security related
articles.

GCC 4.3.0 exposes a kernel bug

Posted Mar 8, 2008 12:00 UTC (Sat) by darwish07 (guest, #49520) [Link]

This is an __insult__ to the lovely LWN's extremely open way of news reporting with also
extremely low-cost subscription cost. 

LWN is not a security advisory website!

GCC 4.3.0 exposes a kernel bug

Posted Mar 9, 2008 17:44 UTC (Sun) by Max.Hyre (subscriber, #1054) [Link]

A couple of notes:
  1. This is not a security announcement. It's an analysis of interactions between the kernel and the GCC, and
  2. Anyone reading LWN will notice the title GCC 4.3.0 exposes a kernel bug. Those interested can go to any security site to get the details.

GCC 4.3.0 exposes a kernel bug

Posted Mar 9, 2008 20:12 UTC (Sun) by i3839 (guest, #31386) [Link]

True, but keep in mind that this is an unusual case, as just upgrading gcc won't fix it. All
applications compiled with gcc 4.3.0 running on Linux or BSD are prone to this potential
problem, including those that are compiled by the users themselves.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:34 UTC (Fri) by johnkarp (guest, #39285) [Link]

'Recently released'?

4.3.0 isn't released yet, according to both gcc.gnu.org and ftp.gnu.org.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 22:43 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

It's on gcc.gnu.org, it hasn't hit ftp.gnu.org yet because of several issues (including the appropriate person being on vacation).

In any case, it's out there and final, so any change would need a new version number.

4.3.1

Posted Mar 7, 2008 22:36 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

Under ordinary circumstances, the GCC developers would plan an x.y.1 release to occur about two months after an x.y.0 release, for any values of x and y, as a bug-fix-only release. The only real question is whether issues like this one might lead to putting out 4.3.1 sooner.

In any case, this bug has generated far more discussion than any number of other bugs that have more impact. Unless you have a program that invokes an x86 string move instruction from inside a signal handler, the issue isn't going to affect your program.

Why is this a problem?

Posted Mar 7, 2008 22:46 UTC (Fri) by mgb (guest, #3226) [Link]

The last para of Jake's article implies that upgrading GCC or installing GCC 4.3.0 compiled
binaries can cause problems.  If true that would be nasty.

ONE: The zero-cost patch needs to be applied to the kernel source before compiling the kernel
with GCC 4.3.0.

TWO: Don't load 4.3.0-compiled out of tree kernel modules into unpatched kernels.

Color my stupid but I just don't see how upgrading GCC or installing binaries can trigger this
problem.  Am I missing something?

Why is this a problem?

Posted Mar 7, 2008 22:59 UTC (Fri) by jake (editor, #205) [Link]

> Am I missing something?

This problem has nothing to do with building the kernel.  It is in building user space
applications.  So, installing GCC 4.3.0 or a binary built with it on any Linux (or evidently
BSD) kernel released could trigger the problem.  If it does memory/string operations in signal
handlers anyway.

jake

Why is this a problem?

Posted Mar 7, 2008 23:20 UTC (Fri) by mgb (guest, #3226) [Link]

Thanks Jake.  My bad.  I read "signal handler" and thought "interrupt handler".

Why is this a problem?

Posted Mar 7, 2008 23:58 UTC (Fri) by vonbrand (guest, #4458) [Link]

No, it has nothing to do with calling mem<foo> in a signal handler. The point is that said functions may set the DF flag, and before it is reset a signal may come in. The signal handler then could run with the flag set, expecting it to be unset. A few instructions inspect this flag.

Why is this a problem?

Posted Mar 8, 2008 0:27 UTC (Sat) by jake (editor, #205) [Link]

> No, it has nothing to do with calling mem<foo> in a signal handler.

What I was trying to say is that it would be a memmove() or something like it in the signal
handler that would be tripped up by DF being set unexpectedly.

jake

Why is this a problem?

Posted Mar 19, 2008 22:33 UTC (Wed) by klossner (subscriber, #30046) [Link]

> No, it has nothing to do with calling mem<foo> in a signal handler.

Sure it does.  If the signal handler call was compiled with the new GCC, then it will expect
the flag to be clear on entry.  If the flag happens to be set and the signal handler calls
mem<foo>, the copy will go backward.  This can be exploited.

GCC 4.3.0 exposes a kernel bug

Posted Mar 7, 2008 23:59 UTC (Fri) by pleple (guest, #50447) [Link]

So the bug was there forever and, if my understanding of the problem is correct, anybody could
make use of it on both Linux and BSD (well, you don't need any compiler to make executable
code). Still, there is no known exploit for it. I know we can't assume that the problem ins't
exploitable but i would say that this is at most very improbable. Is this reasoning correct?

GCC 4.3.0 exposes a kernel bug

Posted Mar 8, 2008 1:52 UTC (Sat) by speedster1 (guest, #8143) [Link]

The main concern is not that someone will *purposefully* write a program that uses this bug to
clobber its own memory, so the fact that someone could have used a non-standard compiler to
achieve this is not relevent.

The real problem is that the new behavior would open up one more way for programmers to
*accidentally* write a program that corrupts its memory, and this could be the memory
corruption bug that adds up with other bugs in the application itself to result in an exploit.

GCC 4.3.0 exposes a kernel bug

Posted Mar 8, 2008 13:32 UTC (Sat) by roblucid (guest, #48964) [Link]

Reverting gcc-4.3 doesn't help matters.

Now this DF bit flaw is known, then anyone can patch a compiler or use 
assembler to attempt to craft an exploit in application code.  The kernel 
is broken, it should not rely on called code to clear the flag, but 
actually ensure the registers are saved, set & restored according to ABI.

Perhaps kernels compiled with gcc < 4.3, can rely on gcc clearing the flag 
when the signal handler is called in a subroutine, in which case it is 
reasonable to argue that back-porting a fix to support gcc-4.3 may not be 
absolutely necessary.  But it's probably as simple to patch stable kernel 
updates with the fix, as it is to detect and warn about a build using 4.3.  
It's not admin-friendly to rely on older kernel source not being built 
with the latest gcc.

Past experience with "apparently unexploitable" flaws, tends to suggest 
that correcting the code is the only safe option.

This is joke, right? Or do you really misunderstood the simple issue?

Posted Mar 8, 2008 17:02 UTC (Sat) by khim (subscriber, #9252) [Link]

If you allow anyone to inject code in your executables you are hosed already. And if don't - you can not exploit this bug. Prerequisites are harsh: gcc 4.3-compiled priveleged daemon and kernel below 2.6.25 ...

Perhaps kernels compiled with gcc < 4.3

May be it's good idea to read the article? It does not matter if your kernel is compiled with gcc 4.2 or gcc 4.3. The question is about things like login or sshd. They must be compiled with gcc 4.3 - only then you can have a problem.

It's not admin-friendly to rely on older kernel source not being built with the latest gcc.

Yup. But that's one and only solution. Why? Kernel pushes gcc to the limit and so kernels always support finite range of compilers supported. "GCC version between x.y.z and x1.y1.z1" was (and is) the only supported mode. If you plan to use kernel 2.6.24 compiled with gcc 4.3 then you should plan to reinstall the system shortly afterwards. If was never supported, and it will not be supported - to compile the kernel with compiler newer then the kernel is insanity.

Past experience with "apparently unexploitable" flaws, tends to suggest that correcting the code is the only safe option.

Code is already corrected - now the question is about deployment...

This is joke, right? Or do you really misunderstood the simple issue?

Posted Mar 11, 2008 9:45 UTC (Tue) by roblucid (guest, #48964) [Link]

No it's not.  Don't think many experienced sysadmins would feel happy 
relying on the "privileged daemon compiled with gcc-4.3" as a sound 
foundation for security.

Securing systems means multiple layers and not leaving apparently small 
flaws and leaving a single point of critical failure.  Then when defense 1 
is broken, the next has to be breached to, which buys time when exploits 
become known, or some script kiddie has started an attack and found a hole 
in some service you offer.

If you're running a web-server for example, you don't give a shell out, 
yet your defense only has to fail once, for some web app to permit code to 
be run.  If you run a host with multiple users, with shell access then 
flipping a register which is meant to be cleared, might cause some 
instability and permit an unintentional DoS.

It doesn't matter, that an exploit is not clear, the fact that it is not 
absolutely unexploitable, argues for patching the kernel as has been done.  
That we agreed on.  My post was not complaining against the kernel, nor 
gcc, but argueing the futileness of patching gcc-4.3 to revert to non-ABI 
behaviour.

Your assumption that gcc-4.3 or another compiler cannot be built by a 
user is wrong, so the logic statement "gcc = 4.3 &  kernel < 2.6.25" 
should actually be the simpler "unpatched kernel < 2.6.25".  You may feel, 
that is too pessimistic, but I'm afraid in the real world root does make 
mistakes, so relying on need for root privilege to install the daemon is 
weak security.

If the kernel is compiled, with an older gcc, then it may very well be 
clearing the DF bit, for the kernel accidentally, and I suspect that the 
reason the kernel wasn't clearing it, was because gcc already did it.  
That's creditting the kernel developers for actually testing conformance 
to the ABI.

As for older kernels, getting compiled with unsupported compilers, 
distro's have done that frequently in the past and also hobbyist types, 
may try latest gcc and see if it works.  A subtle issue like this is 
exactly the type of thing that falls between the cracks.  I agree that 
folk shouldn't do it, but in the real world ppl do build "unsupported" 
combinations and as FOSS doesn't come with a legal warranty, your users 
aren't seeing much difference between that and the situation with the 
correct software versions.

You seem to agree that reverting gcc-4.3 and patching the kernel is the 
correct action, and furthermore that "deployment" may be the weak area, so 
why be so condescending?

Please read the article

Posted Mar 11, 2008 13:44 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

You've got all sorts of misapprehensions about this.

Firstly what you seem to be proposing is that somehow this ABI bug would corrupt the OS
kernel. I can assure you that this cannot happen. Linux* couldn't care two hoots about the DF
bit in the flag register of a userspace process which is all that's being tweaked. However the
process itself might care, and is entitled to ABI correct behaviour - and so Linux needed to
be patched to reset this flag bit when calling the signal handler in the userspace process.

Flipping the DF bit is not a privileged operation, it's a normal function of the x86 processor
family. You can add code to your programs to arbitrarily flip DF if you like, and at most
you'll just manage to make it hang or crash.

To have a security problem, what would be needed is to run privileged code which relies on
this ABI feature, and then send it signals until it malfunctions. In most cases on Linux your
privileged code was compiled with GCC < 4.3 and thus does not rely on this ABI feature. Code
from vendors compiled with a patched GCC 4.3.x will also not be affected. Daemons and other
privileged processes very rarely provide a mechanism to receive signals from unprivileged
users. Most of them do very little in their signal handlers and thus won't be relying on this
ABI feature even with GCC 4.3.0. So you need a very extraordinary set of circumstances to
achieve anything more than crashing your own programs.

The code in which this bug was originally noticed deliberately causes large numbers of
SIGSEGVs during normal operation and handles them by AFAICT calling memmove() from an assembly
language routine. Does that sound like any of the programs running with privileges on your
servers? No, didn't think so.

If you're in the habit of running code created by untrusted users then you don't need this ABI
bug to have problems, indeed it makes no difference at all - you've got a gaping hole in your
security strategy from the start.

* Read NetBSD, FreeBSD, OpenBSD etc. for Linux as you prefer. They all seem to have an
identical bug here, shielded by GCC

Please read the article

Posted Mar 13, 2008 9:00 UTC (Thu) by roblucid (guest, #48964) [Link]

No I do get that when programs have been compiled with gcc < 4.3 they 
clear the bit.  It's just that applications suffering obscure memory 
corruption when memory operations go wrong, and a leaking of the bit 
between processes on signals, is not something anyone ought to want on a 
system.

My point was, that contrary to the comments made early on, reverting GCC 
behaviour is not a sane option.  Patching the kernel is.

Please read the article

Posted Mar 13, 2008 9:48 UTC (Thu) by khim (subscriber, #9252) [Link]

It's just that applications suffering obscure memory corruption when memory operations go wrong

Don't use gcc 4.3 to compile your programs then. Or patch your kernel. Your choice.

and a leaking of the bit between processes on signals, is not something anyone ought to want on a system.

You can only leak bit from program to the same program. And if your program does not trust itself - you are hosed anyway.

Once more from the top.
1. Linux, FreeBSD and other kernels provided kind of "changed ABI" - "DF is not guaranteed to be cleared or set when you enter function" was the change from official ABI.
2. GCC before 4.3 produced code which worked correctly with this "changed ABI".
3. GCC 4.3 started to rely on obscure part of ABI and this led to crashes (in some obscure programs but that's not the point).
4. Kernel 2.6.25 fixed problem and now it's safe to use GCC 4.3.
But it does not mean anything for GCC 4.2 and kernel 2.6.24! They have used incorrect ABI all along - but they used it correctly and consistently. Yes, from formal POV kernel is wrong and GCC is right, but in reality you can fix either GCC or kernel - it does not matter which: GCC 4.2 + kernel 2.6.24 is 100% secure and internally consistent combination, GCC 4.3 + kernel 2.6.25 is 100% secure and internally consistent combination (as far as this bug is concerned, of course). End of story.

How it was found

Posted Mar 10, 2008 20:09 UTC (Mon) by liamh (guest, #4872) [Link]

Just a clarification on the discovery history:
"found when building Steel Bank Common Lisp (SBCL) using the new compiler" is not exactly
correct.  The problem was found when building SBCL, which doesn't use gcc, but which does use
libc6.  There was a change in Debian unstable that introduced a new version of libc6 2.7-8 ->
2.7-9.  That new version was compiled with gcc 4.3.0.  When SBCL builds with this version of
libc6, it either hangs or spins the CPU at 100%.
http://sourceforge.net/mailarchive/forum.php?thread_name=...

How it was found

Posted Mar 13, 2008 18:41 UTC (Thu) by kmccarty (subscriber, #12085) [Link]

Also, for the interested, here is the original entry in the Debian BTS:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=469058

GCC 4.3.0 exposes a kernel bug

Posted Mar 18, 2008 20:40 UTC (Tue) by Max.Hyre (subscriber, #1054) [Link]

For a discouragingly-accurate discussion of this genus of problems see Joel Spolsky's analysis of the development of Martian earphones.


Copyright © 2008, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds