GCC and pointer overflows
In summary, the advisory states:
Application developers and vendors of large codebases that cannot be audited for use of the defective length checks are urged to avoiding [sic] the use of gcc versions 4.2 and later.
This advisory has disappointed a number of GCC developers, who feel that their project has been singled out in an unfair way. But the core issue is one that C programmers should be aware of, so a closer look is called for.
To understand this issue, consider the following code fragment:
char buffer[BUFLEN]; char *buffer_end = buffer + BUFLEN; /* ... */ unsigned int len; if (buffer + len >= buffer_end) die_a_gory_death("len is out of range\n");
Here, the programmer is trying to ensure that len (which might come from an untrusted source) fits within the range of buffer. There is a problem, though, in that if len is very large, the addition could cause an overflow, yielding a pointer value which is less than buffer. So a more diligent programmer might check for that case by changing the code to read:
if (buffer + len >= buffer_end || buffer + len < buffer) loud_screaming_panic("len is out of range\n");
This code should catch all cases; ensuring that len is within range. There is only one little problem: recent versions of GCC will optimize out the second test (returning the if statement to the first form shown above), making overflows possible again. So any code which relies upon this kind of test may, in fact, become vulnerable to a buffer overflow attack.
This behavior is allowed by the C standard, which states that, in a correct program, pointer addition will not yield a pointer value outside of the same object. So the compiler can assume that the test for overflow is always false and may thus be eliminated from the expression. It turns out that GCC is not alone in taking advantage of this fact: some research by GCC developers turned up other compilers (including PathScale, xlC, LLVM, TI Code Composer Studio, and Microsoft Visual C++ 2005) which perform the same optimization. So it seems that the GCC developers have a legitimate reason to be upset: CERT would appear to be telling people to avoid their compiler in favor of others - which do exactly the same thing.
The right solution to the problem, of course, is to write code which complies with the C standard. In this case, rather than doing pointer comparisons, the programmer should simply write something like:
if (len >= BUFLEN) launch_photon_torpedoes("buffer overflow attempt thwarted\n");
There can be no doubt, though, that incorrectly-written code exists. So the addition of this optimization to GCC 4.2 may cause that bad code to open up a vulnerability which was not there before. Given that, one might question whether the optimization is worth it. In response to a statement (from CERT) that, in the interest of security, overflow tests should not be optimized away, Florian Weimer said:
Joe Buck added:
It is clear that the GCC developers see their incentives as strongly
pushing toward more aggressive optimization. That kind of optimization
often must assume that programs are written correctly; otherwise the
compiler is unable to remove code which, in a correctly-written
(standard-compliant) program, is unnecessary. So the removal of pointer
overflow checks seems unlikely to go away, though it appears that some new warnings will be added to alert
programmers to potentially buggy code. The compiler may not stop
programmers from shooting themselves in the foot, but it can often warn
them that it is about to happen.
Index entries for this article | |
---|---|
Security | CERT |
Security | GCC |
GCC and pointer overflows
Posted Apr 16, 2008 19:14 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link] (4 responses)
Certainly, though, we are interested in producing better warnings to detect bad code of this kind, and I would expect (though I can't promise) that future GCC versions will warn in cases like this.
Posted Apr 16, 2008 19:14 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (4 responses)
duh
Posted Apr 16, 2008 19:16 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link]
The last paragraph of the article already made my point. Sorry. Of course, there's the caveat that any new warnings have to be tested, can't produce excessive noise, etc.
Posted Apr 16, 2008 19:16 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]
GCC and pointer overflows
Posted Apr 18, 2008 5:48 UTC (Fri)
by jd (guest, #26381)
[Link]
Posted Apr 18, 2008 5:48 UTC (Fri) by jd (guest, #26381) [Link]
In many cases with GCC, optimizations that are implied by something like -O2 or -O3 can be specifically disabled, or additional optimizations can be specifically enabled. If either case is true of the pointer arithmetic optimization in the newer GCCs, then this would seem to be a better case to put. The "insecurity" is optional and only there because the user has stipulated (implicitly or explicitly) that it should be. Alternatvely, it might be argued that this shouldn't be in GCC proper (it's not a compiler function, it's a validation function) but in a distinct module which scanned for likely range violations and other cadidates for problems - something akin to splint or the stanford checker but restricted to cases that can be reliably detected. You could then run it independently or as part of the toolchain. A third option would be to modify a source-to-source compiler like OpenIMPACT to alter tests that would be optimized this way into a more correct form that doesn't need optimizing by GCC. A fourth option is to have bounds-checking at runtime be implicit and require that it be explicitly excluded at compile-time. Alternatively, since runtime bounds-checking is already available, state that its exclusion is a voluntary statement on the part of a developer that the validation of pointers is not wanted. In other words, this seems a non-argument on any meaningful timescale. There are way too many ways for the GCC developers, other members of the open source community or the developers of potentially suspect code to eliminate the problem one way or another. If there is a potential problem, it is a potential problem by mutual agreement.
GCC and pointer overflows
Posted Apr 24, 2008 7:15 UTC (Thu)
by eliezert (subscriber, #35757)
[Link] (1 responses)
Posted Apr 24, 2008 7:15 UTC (Thu) by eliezert (subscriber, #35757) [Link] (1 responses)
Of course it's best if we all stick to the standard, but people do make mistakes. I think the simplest way to minimize this is to have an attribute marking the check as a safty/sanity check so the compiler knows that: 1. It should not optimize this check out. 2. If it knows that this check will allways fail it can emit a warning. 3. If this check is assuming something improper it's better to fail the compilation and let the coder look up the error somewhere, than to silently drop the test.
GCC and pointer overflows
Posted Apr 24, 2008 20:39 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Apr 24, 2008 20:39 UTC (Thu) by nix (subscriber, #2304) [Link]
You can't (yet) attach attributes to arbitrary expressions, though.
would using casts fix the problem?
Posted Apr 16, 2008 19:23 UTC (Wed)
by roskegg (subscriber, #105)
[Link] (7 responses)
Posted Apr 16, 2008 19:23 UTC (Wed) by roskegg (subscriber, #105) [Link] (7 responses)
So if I cast the pointers to long unsigned ints, would the second test THEN be optimized away?
would using casts fix the problem?
Posted Apr 16, 2008 19:31 UTC (Wed)
by zeekec (subscriber, #2414)
[Link]
Posted Apr 16, 2008 19:31 UTC (Wed) by zeekec (subscriber, #2414) [Link]
Some recommendations from CERT. https://www.securecoding.cert.org/confluence/display/secc...
would using casts fix the problem?
Posted Apr 16, 2008 19:40 UTC (Wed)
by mb (subscriber, #50428)
[Link]
Posted Apr 16, 2008 19:40 UTC (Wed) by mb (subscriber, #50428) [Link]
> So if I cast the pointers to long unsigned ints, would the second test THEN be optimized away? Some german news site stated a few days ago that it would not optimize it away then. They proposed it as a possible workaround.
would using casts fix the problem?
Posted Apr 16, 2008 19:46 UTC (Wed)
by MathFox (guest, #6104)
[Link] (4 responses)
I recall that one programmer asked me why
Posted Apr 16, 2008 19:46 UTC (Wed) by MathFox (guest, #6104) [Link] (4 responses)
int main() { char a[10]; char *aend = a+10; printf("%ld\n", (int)aend - (int)a); }did print something different from 10 on a Cray vector computer...
would using casts fix the problem?
Posted Apr 16, 2008 19:50 UTC (Wed)
by MathFox (guest, #6104)
[Link] (3 responses)
Make the printf line
printf("%lu\n", (unsigned long)aend - (unsigned long)a);
Posted Apr 16, 2008 19:50 UTC (Wed) by MathFox (guest, #6104) [Link] (3 responses)
would using casts fix the problem?
Posted Apr 16, 2008 20:27 UTC (Wed)
by jengelh (subscriber, #33263)
[Link] (2 responses)
Or better yet don't cast at all. Posted Apr 16, 2008 20:27 UTC (Wed) by jengelh (subscriber, #33263) [Link] (2 responses)
%td
for printf
can take a ptrdiff_t
, which is what such a subtraction should usually yield.
would using casts fix the problem?
Posted Apr 16, 2008 21:29 UTC (Wed)
by MathFox (guest, #6104)
[Link] (1 responses)
I guess that you missed the point completely. aend-a (no casts needed) should give 10 on any platform. The bit pattern in a pointer is implementation defined and that particular platform had (64-bits IIRC) word-based addresses in a 48 bit address space. The low 48 bits of a character pointer were the address of the word containing the character; the high bits addressed the byte in the word.
Posted Apr 16, 2008 21:29 UTC (Wed) by MathFox (guest, #6104) [Link] (1 responses)
Standard conforming programs don't rely on pointer layout.
would using casts fix the problem?
Posted Apr 17, 2008 18:55 UTC (Thu)
by hummassa (guest, #307)
[Link]
Posted Apr 17, 2008 18:55 UTC (Thu) by hummassa (guest, #307) [Link]
yes, aend - a (without cast) _is_ 10 in any platform, but (int)aend - (int)a is _undefined_ behaviour, because the bit pattern when converting from pointer to int is implementation-defined.
GCC and pointer overflows
Posted Apr 16, 2008 19:35 UTC (Wed)
by mgb (guest, #3226)
[Link] (3 responses)
Posted Apr 16, 2008 19:35 UTC (Wed) by mgb (guest, #3226) [Link] (3 responses)
Dear Sirt, If'n I runs me a shell scrip with one a them thar "rm -rf /usr" in it my sistem ups 'n dyes. This are a super-red top-level secretary alert bug in that so-called "free" soft stuff. Please outlaw all pinko programmes and force everyone to use 'onest God-fearing 'merican for-profit soft-warez. A Well Wisher
GCC and pointer overflows
Posted Apr 16, 2008 19:40 UTC (Wed)
by darwish07 (guest, #49520)
[Link] (2 responses)
Posted Apr 16, 2008 19:40 UTC (Wed) by darwish07 (guest, #49520) [Link] (2 responses)
hehe, don't be so concipracy-theoritic ;). Yes, the advisory is just mentioning GCC but it may just be lack of research from the researcher, nothing more.
GCC and pointer overflows
Posted Apr 16, 2008 20:54 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
No, a CERT representative has been on the GCC list offering an ever-changing rationale for not mentioning any other compilers.
Posted Apr 16, 2008 20:54 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (1 responses)
CERT responds on the GCC mailing list
Posted Apr 17, 2008 3:55 UTC (Thu)
by stuart_hc (guest, #9737)
[Link]
Here is one example of CERT's rationale for singling out GCC, citing GCC's popularity.Posted Apr 17, 2008 3:55 UTC (Thu) by stuart_hc (guest, #9737) [Link]
In this email CERT concedes they should change the advisory to mention other compilers, but it doesn't appear that CERT have made that change yet.
GCC and pointer overflows
Posted Apr 16, 2008 19:50 UTC (Wed)
by iabervon (subscriber, #722)
[Link] (2 responses)
Note that the wrap-around check isn't necessarily sufficient if sizeof(*buffer) > 1, as in:
Posted Apr 16, 2008 19:50 UTC (Wed) by iabervon (subscriber, #722) [Link] (2 responses)
int buffer[BUFLEN]; int *buffer_end = buffer + BUFLEN; /* ... */ unsigned int len; if (buffer + len >= buffer_end || buffer + len < buffer) loud_screaming_panic("len is out of range\n");At a bit more than 1/(sizeof int) of MAX_INT, the pointer sum will probably go back through the buffer. For ideal security on this sort of stuff, GCC would generate code where a pointer addition that overflows compares greater than any valid pointer value when compared in the same expression. It should be more efficient than the second check anyway, since it's only checking processor flags on calculations it's doing anyway in a correctly-predicted not-taken branch.
GCC and pointer overflows
Posted Apr 16, 2008 20:58 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
The compiler then would have to insert such a check into every array reference in a loop, and your code would run about 3x slower. After
all, they could wrap around the end, right?
Posted Apr 16, 2008 20:58 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (1 responses)
But in a properly structured program, such checks are not needed, as pointers are not random, but rather refer to objects that have definite sizes and that do not wrap around the end of memory. If you think that you need such a check, there's something wrong with your program structure, some check that was omitted elsewhere. What matters is the size of the buffer, and you can go beyond its end without wrapping around the end of the address space.
GCC and pointer overflows
Posted Apr 17, 2008 14:06 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted Apr 17, 2008 14:06 UTC (Thu) by iabervon (subscriber, #722) [Link]
How so? I was only suggesting that the expression "pointer1 + offset >= pointer2" be treated differently (that is, the magic does away if you store the LHS in a variable and use the variable). (And, of course, the change would only apply to cases where the LHS overflows the address space, which is clearly undefined, so having odd things matter isn't a problem.) If you do something with "pointer + offset" other than comparing it with something else, I wouldn't have anything change (what would it do, anyway, without a test and branch in the code?); and in the case where you're doing the comparison, processors generally have a status bit that gets updated in the multiply/shift and add and is pretty much free to have an unlikely jump based on. Except on architectures where the instruction decode significantly limits the pipeline, it should be at most one cycle to test so long as you do each test right after the ALU operation. C, in general, doesn't make good use of the fact that processors make it trivial to test for overflow, but that doesn't mean that C compilers can't do extra-clever things with this ability in cases where there's undefined behavior associated with it.
GCC and pointer overflows
Posted Apr 16, 2008 21:22 UTC (Wed)
by zooko (guest, #2589)
[Link] (1 responses)
Posted Apr 16, 2008 21:22 UTC (Wed) by zooko (guest, #2589) [Link] (1 responses)
I painstakingly wrote a macro over a course of years that does this: Macro which evaluates true if the expression (x+y) will result in arithmetic overflow. It also evaluates true if one of the operands is negative and the other is of a type that is too large to fit into a long long (because the result of the addition is not guaranteed in the C89 standard). Treat it as though it were defined something like this: bool ADD_WOULD_OVERFLOW({anyinttype} x, {anyinttype} y); I'm not 100% certain that I got all the edge cases right, but at least it passes my own test suite. One of the key insights to write this macro is this: while "x + y < x" is not guaranteed to be valid (if one of them is signed), "MAX_INT - x < y" is. #define ADD_WOULD_OVERFLOW_Styp(x, y, typ) ((((x) > 0) && ((y) > 0) && ((Z_MAX_typ(typ) - (x)) < (y))) || (((x) < 0) && ((y) < 0) && ((Z_MIN_typ(typ) - (x)) > (y))))
GCC and pointer overflows
Posted Apr 16, 2008 22:33 UTC (Wed)
by gravious (guest, #7662)
[Link]
Posted Apr 16, 2008 22:33 UTC (Wed) by gravious (guest, #7662) [Link]
How about? CLEAR_CARRY_FLAG // arch dep macro temp=x+y if (CARRY_FLAG_IS_SET) // arch dep macro naughty_naughty(); Surely an easy to find out if something WOULD_OVERFLOW is to perform the operation and check if it DID_OVERFLOW :)
GCC and pointer overflows
Posted Apr 16, 2008 22:49 UTC (Wed)
by aegl (subscriber, #37581)
[Link] (6 responses)
I'm having some dificulty parsing the rationale by which this optimization is allowed.
Posted Apr 16, 2008 22:49 UTC (Wed) by aegl (subscriber, #37581) [Link] (6 responses)
"in a correct program, pointer addition will not yield a pointer value outside of the same object"
Is gcc deciding that "buffer + len < buffer" makes this an "incorrect" program, so it can generate any random code (in this case no code at all) because the program is "incorrect" and so deserves to die?
Could gcc get more radical and apply the same logic to the first clause of this test? The declaration of "buffer" is in scope and "buffer_end" is statically initialized and never changed (OK ... it would have to be declared "static" for gcc to be really sure about this, so lets pretend that it was). So a test for "buffer + len >= buffer_end" appears to be just as "incorrect" if you believe that pointer addition will not yield a pointer "outside of the same object".
I'm sure the intent of the "same object" rule is to stop programmers from making assumptions about the proximity of different objects. E.g.
int a[5], b[5], *p = a;
p += 5;
/* should not assume that p now points to "b" */
It seems a bit of a stretch to get from this to outlawing a test like "buffer + len < buffer".
GCC and pointer overflows
Posted Apr 16, 2008 23:20 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (1 responses)
Posted Apr 16, 2008 23:20 UTC (Wed) by wahern (subscriber, #37304) [Link] (1 responses)
No, the intent [of the standard] is to support segmented memory architectures. The "one after" rule is a concession for prototypical looping patterns. When you loop in reverse you have to be careful, because there's no "one before" rule. Compilers like TinyCC, in debug mode, will insert instrumentation code which relies on these rules to check buffer under and overflows (not just accesses, but actual pointer arithmetic--it can detect invalid pointer values). So it's smart to stick to the ISO standard, because these rules can cut both ways; that is, they can help not only optimization, but debugging/security, too.
GCC and pointer overflows
Posted Apr 16, 2008 23:31 UTC (Wed)
by wahern (subscriber, #37304)
[Link]
Posted Apr 16, 2008 23:31 UTC (Wed) by wahern (subscriber, #37304) [Link]
I should be more clear. Because there's no "one before" rule, then "buffer + len < buffer" will always be false in a correct program. A correct program would not attempt to derive an invalid pointer (let alone access an object through such a pointer). It's worth noting that "buffer" is the actual object, not merely a pointer. This is an instance where we can't forget that arrays and pointers in C really are distinct, even though we can so often treat an array like a pointer. And the compiler in this instance has sufficient information about the object to make the optimization, whereas in many other instances it would not be able to do this (and so people get the wrong impression). This is a good thing.
GCC and pointer overflows
Posted Apr 17, 2008 9:39 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (3 responses)
Posted Apr 17, 2008 9:39 UTC (Thu) by kleptog (subscriber, #1183) [Link] (3 responses)
There's a terminology problem here: the program is not incorrect, the programmer has made an incorrect assumption. The assumption is that (buf + len < buf) will be true if len is very large. Besides the fact that the assumption is false if sizeof(*buf) != 1, the GCC team (and other compilers) point out that this assumption is not warrented by the C spec. Stronger still, the C spec allows you to *assume* the test is false, no matter the value of len (assuming len is unsigned btw). That said, I'd love a way to say: if( __wraps( buf + len ) ) die();
GCC and pointer overflows
Posted Apr 17, 2008 16:01 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (2 responses)
Posted Apr 17, 2008 16:01 UTC (Thu) by wahern (subscriber, #37304) [Link] (2 responses)
if (~sizeof buf < len) { die(); } This only works with unsigned values, and there are probably some caveats with width and promotion rules (portable, nonetheless). Also, assuming your environment uses linear addressing, and there's no other funny stuff going on with pointer bits (like the effectively 16 free bits on AMD64--using 48-bit addressing). if (~(uintptr_t)buf < len) { die(); } I believe this should work on Windows and all Unix systems (guaranteed by additional SUSv3 constraints), but I'm not positive.
GCC and pointer overflows
Posted Apr 17, 2008 22:03 UTC (Thu)
by jzbiciak (guest, #5246)
[Link] (1 responses)
Posted Apr 17, 2008 22:03 UTC (Thu) by jzbiciak (guest, #5246) [Link] (1 responses)
Of course, it fails for dynamically allocated and grown buffers since sizeof() can't tell you the length.
Also, you failed to account for element size. The following should work, though, for arrays of static size:
if (len > (sizeof(buf) / sizeof(buf[0])) die_in_a_fire();
I don't understand why you have the bitwise negation operator in there. Also, len is a length, not a pointer type, so pointer format doesn't matter.
GCC and pointer overflows
Posted Apr 19, 2008 5:51 UTC (Sat)
by wahern (subscriber, #37304)
[Link]
Posted Apr 19, 2008 5:51 UTC (Sat) by wahern (subscriber, #37304) [Link]
The question was how to check if arithmetic overflowed/wrapped, not whether an index or length is valid.
GCC and pointer overflows
Posted Apr 16, 2008 22:58 UTC (Wed)
by neufeld (subscriber, #9124)
[Link]
Posted Apr 16, 2008 22:58 UTC (Wed) by neufeld (subscriber, #9124) [Link]
I once fixed a bug in a very popular application, it produced occasional bus errors on the S390 architecture. It turned out to be due to a piece of code of the form "buffer + len". The C standard requires that you be able to create pointers in that form pointing to any element in the allocated length, plus a single entry past the end. It does not guarantee that you can even compute the address of a pointer that lies outside this range. "ptr + int" calculation itself caused the bus error when the pointer lay near the end of the segment.
GCC and pointer overflows
Posted Apr 16, 2008 23:28 UTC (Wed)
by mongenet (guest, #43575)
[Link]
Posted Apr 16, 2008 23:28 UTC (Wed) by mongenet (guest, #43575) [Link]
The CERT advisory looks bogus.
First, the example uses an int
instead of an unsigned int
(note that our fine editor uses an unsigned int
in his example):
char *buf; int len;
Then, it pretends that "gcc will assume that buf+len >= buf
". If len
is an int
, it may be negative and gcc may obviously not (and does not) assume that buf+len >= buf
(of course, buf
must point to an object in an array for the pointer arithmetic to be valid).
Then, the length check if(buf+len < buf)
is completely bogus: buf+len
may point out of the array (including just beyond the end of the array) and that is a bug even without an overflow.
The "Solution" given by CERT has the same bug, but with two ugly cast operators added! A correct solution is of course given by our editor.
GCC and pointer overflows
Posted Apr 17, 2008 0:01 UTC (Thu)
by jzbiciak (guest, #5246)
[Link] (2 responses)
Posted Apr 17, 2008 0:01 UTC (Thu) by jzbiciak (guest, #5246) [Link] (2 responses)
pointer addition will not yield a pointer value outside of the same object
Just some pedantry, but I believe you're allowed to point to the first element after the end of an array, as long as you don't dereference that element. Pointer comparisons with such a pointer are supposed to work, too.
That said, the problem is very similar to time comparison, where you have a running time counter that could wrap around, but all measured intervals still fit within the total size of the time-counter type. There, the safe way to do things is to keep the time as an unsigned value, and subtract off the base.
You can't really do that here, because ptrdiff_t really ought to be a signed type. In this case, if you keep "len" in a signed variable, negative lengths are always an error and so you can check for them directly before adding to a pointer base. Positive lengths can always be compared against the buffer size. In either case, you're comparing indices against each other, not pointers.
Comparing against a generated pointer is a premature optimization, and begs for the compiler to slap you. This is especially true when the pointer is to a type for which sizeof(type) > 1. For a large structure, the address arithmetic could wrap around several times, even for relatively small indices. Imagine a structure for which sizeof(type) == 65536. Any length > 32767 causes problems on a 32-bit system. Granted, arrays of such large structures tend to be very rare.
GCC and pointer overflows
Posted Apr 17, 2008 16:23 UTC (Thu)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Apr 17, 2008 16:23 UTC (Thu) by mb (subscriber, #50428) [Link] (1 responses)
> Any length > 32767 causes problems on a 32-bit system. Care to explain why? Did you mean "16-bit system"?
GCC and pointer overflows
Posted Apr 17, 2008 17:14 UTC (Thu)
by jzbiciak (guest, #5246)
[Link]
Posted Apr 17, 2008 17:14 UTC (Thu) by jzbiciak (guest, #5246) [Link]
No, I meant 32 bit system. If you do the pointer arithmetic first on an array of structs whose size is 65536, then any length > 32767 will give a byte offset that's >= 231 away. A length of 65535 will give you a pointer that's one element before the base.
I guess it's an oversimplification to say 32767 specifically. Clearly, any len > 65535 will wrap back around and start to look "legal" again if the element size is 65536. Any len <= 65535 but greater than some number determined by how far the base pointer is from the end of the 32-bit memory map will wrap to give you a pointer that is less than the base. So, the threshold isn't really 32767, but something larger than the length of the array.
That was my main point: When you factor in the size of the indexed type, the threshold at which an index causes the pointer to wrap around the end of the address space and the potential number of wrapping scenarios gets you in trouble. Thus, the only safe way is to compare indices, not pointers.
For example, suppose the compiler didn't do this particular optimization, and you had written the following code on a machine for which sizeof(int)==4:
extern void foo(int*); void b0rken(unsigned int len) { int mybuf[BUFLEN]; int *mybuf_end = mybuf + BUFLEN; int i; if (mybuf + len < mybuf_end && mybuf + len >= mybuf) { for (i = 0; i < len; i++) mybuf[i] = 42; } foo(mybuf); /* prevent dead code elimination */ }
What happens when you pass in 0x40000001 in for len? The computed pointer for mybuf + len will be equal to &mybuf[1], because under the hood, the compiler multiplies len by sizeof(int), giving 0x00000004 after truncating to 32 bits. How many times does the loop iterate, though?
This happens regardless of whether the compiler optimizes away the second test.
Now, there is a chance a different optimization saves your butt here. If the compiler does common subexpression elimination and some amount of forward substitution, it may rewrite that if statement's first test as follows. (Oversimplified, but hopefully it gives you an idea of what the compiler can do to/for you.)
Step 0: Represent pointer arithmetic as bare arithmetic (internally)
mybuf + 4*len >= mybuf_end
Step 1: Forward substitution.
mybuf + 4*len >= mybuf + 4*BUFLEN
Step 2: Remove common terms from both sides. (Assumes no integer overflow with pointer arithmetic--what started this conversation to begin with.)
4*len >= 4*BUFLEN
Step 3: Strength reduction. (Again, assumes no integer overflow with pointer arithmetic.)
len >= BUFLEN
That gets us back to what the test should have been to begin with. There's no guarantee that the compiler will do all of this though. For example, GCC 4.2.0 doesn't seem to optimize the test in this way. I just tried the Code Composer Studio compiler for C64x (v6.0.18), and it seems to do up through step 2 if the array is on the local stack, thereby changing, but not eliminating the bug.
It turns out GCC does yet a different optimization that changes the way in which the bug might manifest. It eliminates the original loop counter entirely, and instead transforms the loop effectively into:
int *ptr; for (ptr = mybuf; ptr != mybuf + len; ptr++) *ptr++ = 42;
This actually has even different safety characteristics.
Moral of the story? Compare indices, not computed pointers to avoid buffer overflows.
Solution
Posted Apr 17, 2008 0:44 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (2 responses)
Posted Apr 17, 2008 0:44 UTC (Thu) by clugstj (subscriber, #4020) [Link] (2 responses)
How about this obvious solution. When compiling old crusty code, don't use the latest-and-greatest compiler optimizations? (I'm assuming this one can be turned off.) The older, and less recently maintained, the code, the more conservative the optimization should be.
Solution
Posted Apr 17, 2008 15:27 UTC (Thu)
by proski (subscriber, #104)
[Link] (1 responses)
Bugs exist not only in old crusty code. Just because the code was written recently, it doesn't make the code better or safer. Even if the code was developed with the new compiler but was not specifically tested for overflows, it still can have that problem.
Posted Apr 17, 2008 15:27 UTC (Thu) by proski (subscriber, #104) [Link] (1 responses)
Solution
Posted Apr 17, 2008 17:24 UTC (Thu)
by clugstj (subscriber, #4020)
[Link]
Posted Apr 17, 2008 17:24 UTC (Thu) by clugstj (subscriber, #4020) [Link]
OK, how about this: if you don't trust the code, don't use the latest whiz bang optimizations! The documentation for GCC should specifically warn about optimizations that can do bad things with questionable code. As an example, here is a quote from the "Sun Workshop 6 update 2" man page: -xO3 In addition to optimizations performed at the -xO2 level, also optimizes references and definitions for external variables. This level does not trace the effects of pointer assignments. When compiling either device drivers that are not properly protected by volatile, or programs that modify external variables from within signal handlers, use -xO2 Let's not penalize GCC because of how some people will use/abuse it.
Since when does GCC *assume* the program to be correct?
Posted Apr 17, 2008 19:38 UTC (Thu)
by brouhaha (guest, #1698)
[Link] (8 responses)
Posted Apr 17, 2008 19:38 UTC (Thu) by brouhaha (guest, #1698) [Link] (8 responses)
This behavior is allowed by the C standard, which states that, in a correct program, pointer addition will not yield a pointer value outside of the same object. So the compiler can assume that the test for overflow is always false and may thus be eliminated from the expression.GCC can only assume that the test for overflow is always false if it first assumes that the program is correct. Since when does it assume that? If GCC assumes that the program is correct, why does it sometimes generate errors and/or warnings at compile time? Correct programs won't require any error or warning messages at compile time, so there's no point in GCC having any code to check for the conditions that can cause those errors and warnings. Clearly it would be more efficient for GCC to avoid those checks, so they should be eliminated.
Since when does GCC *assume* the program to be correct?
Posted Apr 17, 2008 20:08 UTC (Thu)
by nix (subscriber, #2304)
[Link] (7 responses)
Posted Apr 17, 2008 20:08 UTC (Thu) by nix (subscriber, #2304) [Link] (7 responses)
GCC assumes that a program you feed its C compiler is written in C. If what you feed it cannot be a valid program (e.g. it contains a syntax error), it errors. If it's likely that it's invalid, it often warns. But if what you feed it has one valid interpretation and one which must be invalid, of course it must pick the valid interpretation. Anything else would lead to its being unable to compile valid programs, which would be a standards violation (and bloody annoying).
Since when does GCC *assume* the program to be correct?
Posted Apr 17, 2008 20:39 UTC (Thu)
by brouhaha (guest, #1698)
[Link] (6 responses)
My point exactly. If the assumption is only that the program is written in C, but NOT that it is a correct C program, then it isn't reasonable to assume that the program's use of pointers meets the constraint defined by the standard, in which a pointer will never be manipulated to point outside its object.
Posted Apr 17, 2008 20:39 UTC (Thu) by brouhaha (guest, #1698) [Link] (6 responses)
That would be a fine optimization to use when you choose some high level of optimization that includes unsafe optimizations, but it shouldn't happen at -O1.
Since when does GCC *assume* the program to be correct?
Posted Apr 17, 2008 21:45 UTC (Thu)
by nix (subscriber, #2304)
[Link] (5 responses)
Posted Apr 17, 2008 21:45 UTC (Thu) by nix (subscriber, #2304) [Link] (5 responses)
Something which invokes undefined behaviour according to the standard is, in a sense, not written in C. Of course any useful program will do things *beyond* the standard (e.g. relying on POSIX); but relying on things the like pointer wraparound is a rather different matter. (Who the hell would ever intentionally rely on such things anyway? It pretty much *has* to be a bug to have such code.)
Since when does GCC *assume* the program to be correct?
Posted Apr 17, 2008 22:58 UTC (Thu)
by brouhaha (guest, #1698)
[Link] (4 responses)
There are three possibilities:
Posted Apr 17, 2008 22:58 UTC (Thu) by brouhaha (guest, #1698) [Link] (4 responses)
- GCC does not make assumptions about the validity of the input
- GCC assumes that the input is a C program, but not necessarily a valid one (according to the C standard)
- GCC assumes that the input is a valid C program
A compiler assuming that the input is a valid program is counter to my expectations as a user of the compiler. Unless I explicitly turn on unsafe optimizations, I don't expect it to optimize away any comparisons that I've written, unless it can prove that the condition will always have the same result, based on my actual program code, not on what the C standard says a valid program may or may not do.
I have no problem at all with having such optimizations enabled as part of the unsafe optimizations at higher -O option levels.
Since when does GCC *assume* the program to be correct?
Posted Apr 18, 2008 11:18 UTC (Fri)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Apr 18, 2008 11:18 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)
The problem is that the boundary between 'a C program, but not necessarily a valid one' and 'not a C program' is questionable. if (a + foo < a) is testing something which, in any C program conforming to the Standard without really weird extensions, must be false. This is every bit as true as if (sizeof (a) < 1) would be. If it decided that, oh, that could be true after all, it's choosing an interpretation which the Standard forbids. ... and if the compiler starts accepting that, what other non-C programs should it start accepting? Perhaps we should spell 'while' differently when the locale is de_DE? Perhaps `mary had a little lamb' is now defined as a valid C program? (Sure, compilers can *do* all this, and GCC does have some extensions chosen explicitly because their syntax is invalid Standard C --- the statement-expression extension springs to mind --- but the barrier to new extensions is very high these days, and in any case that doesn't mean that *anything* people do wrong should be defined as a language extension, especially not when it's as weird and devoid of practical utility as this. Portability to other compilers is important.)
Since when does GCC *assume* the program to be correct?
Posted Apr 18, 2008 19:15 UTC (Fri)
by brouhaha (guest, #1698)
[Link]
There's a big difference between the (a + foo < a) and (sizeof (a) < 1) cases, which is that the former is something that a programmer is likely to code specifically because he or she knows that the program might (unintentionally) be buggy, in an attempt to catch a bug, while the latter is unlikely to occur at all, and certainly not as something a programmer is likely to deliberately test for.
Posted Apr 18, 2008 19:15 UTC (Fri) by brouhaha (guest, #1698) [Link]
If it decided that, oh, that could be true after all, it's choosing an interpretation which the Standard forbids.Yet which can actually quite easily happen in real programs. NOWHERE does the standard say that a compiler has to optimize away tests that might always have a fixed value for valid programs, but might easily have differing values for buggy programs.
Perhaps we should spell 'while' differently when the locale is de_DE?You've lost me here. I don't see any way that a purely semantic error in a program could result in "while" being misspelled, even if the locale is de_DE.
Since when does GCC *assume* the program to be correct?
Posted Apr 18, 2008 16:19 UTC (Fri)
by viro (subscriber, #7872)
[Link] (1 responses)
Posted Apr 18, 2008 16:19 UTC (Fri) by viro (subscriber, #7872) [Link] (1 responses)
Standard is a contract between authors of program and authors of C implementation; if program invokes undefined behaviour, all bets are off. It is allowed to compile the check in question into "check if addition had triggered an overflow; if it had, before bothering with any comparisons do unto luser what Simon would have done on a particulary bad day". It can also turn that into "if addition overflows, take the value of first argument". And optimize according to that. It's not a matter of optimizing your comparisons away; it's a matter of addition having no prescribed semantics in case of overflows, regardless of optimizations.
Since when does GCC *assume* the program to be correct?
Posted Apr 18, 2008 21:00 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Apr 18, 2008 21:00 UTC (Fri) by nix (subscriber, #2304) [Link]
Well said. Also, while in some cases it is a QoI issue which high-quality implementations will in some cases prescribe useful semantics for, this isn't such a case. I can't think of any particularly useful semantics for pointer wraparound, especially given that distinct objects have no defined nor stable relationships with each other anyway. Operating under the rules of modular arithmetic might have been nice, and perhaps a more recent language would define that...
Is this an useful optimization ??
Posted Apr 18, 2008 16:42 UTC (Fri)
by mikov (guest, #33179)
[Link] (14 responses)
Posted Apr 18, 2008 16:42 UTC (Fri) by mikov (guest, #33179) [Link] (14 responses)
So, this is similar to this issue: int x = ...; if (x + CONSTANT < x) overflow(); GCC will optimize out the check completely because the standard says so. There is no arguing that GCC is in its right to do the optimization. However there are plenty of circumstances where this code is in fact valid (when the values of x and CONSTANT are controlled and the code is running on two's complement machine, etc). So, I am concerned about two things: * Are these optimizations really beneficial ? Do situations like the above occur in _correct_ code ? * Don't they make C a less useful language ? C is often regarded as high level assembler. When I write something, it should not be up to the compiler to second guess me.
Is this an useful optimization ??
Posted Apr 18, 2008 20:59 UTC (Fri)
by ibukanov (subscriber, #3942)
[Link] (13 responses)
Posted Apr 18, 2008 20:59 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (13 responses)
> * Are these optimizations really beneficial ? Do situations like the above occur in _correct_ code ? Stuff like x + CONSTANT < x can trivially happen as a result of a complex macro expansion especially after changes done by different people. But when this happens, I prefer the compiler to warn about it, not to silently optimize the code.
Is this an useful optimization ??
Posted Apr 18, 2008 21:22 UTC (Fri)
by mikov (guest, #33179)
[Link] (1 responses)
Posted Apr 18, 2008 21:22 UTC (Fri) by mikov (guest, #33179) [Link] (1 responses)
Exactly my point. It is not easy to imagine a case where this is a _desirable_ optimization. It is either a bug, which should be warned, or it is deliberate. Either way the compiler should not remove the code! I imagine however that the general assumption "int + positive_value is always greater than int" may be useful for complex loop optimizations and the like. It probably allows to completely ignore integer overflow in cases like this "for (int i = 0; i < x; ++i ) {...}".
Is this an useful optimization ??
Posted Apr 18, 2008 22:29 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Apr 18, 2008 22:29 UTC (Fri) by nix (subscriber, #2304) [Link]
That's equally useful for pointer arithmetic. Think arrays, strings, pointer walks, even things like Floyd's algorithm (smart optimizers can even spot the relationship between the tortoise and the hare: allowing for the hare to wrap around would ruin that).
Is this an useful optimization ??
Posted Apr 18, 2008 22:03 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (10 responses)
Posted Apr 18, 2008 22:03 UTC (Fri) by giraffedata (guest, #1954) [Link] (10 responses)
I would prefer a warning only if there's an easy way to disable the warning. And Gcc almost never provides a way to say, "don't warn me about this particular piece of code; I've already verified it's supposed to be that way." The things I have to do to stop warning messages often make the code harder to read or slower to execute. But because the warnings are so often valid, I won't accept any code that generates any warnings.
I've used other compilers that at least have the ability to turn off warnings for a certain stretch of source code. And others have fine-grained selection of types of warnings.
I prefer the compiler to warn about it, not to silently optimize the code.
I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else."
Is this an useful optimization ??
Posted Apr 19, 2008 18:16 UTC (Sat)
by mikov (guest, #33179)
[Link] (9 responses)
Posted Apr 19, 2008 18:16 UTC (Sat) by mikov (guest, #33179) [Link] (9 responses)
I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else."
I think your are missing the point that this code is often both valid and intentional. After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc.
I think that it happens to be one of the most efficient ways to check for overflow in some cases. So, it is a trade off - the standard has chosen to declare perfectly valid code as invalid in order to make other optimizations easier.
For example, a similar trade off would be to declare that all aliasing is invalid. Imagine the wild optimizations that would be made possible by that !
Is this an useful optimization ??
Posted Apr 19, 2008 19:19 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
Posted Apr 19, 2008 19:19 UTC (Sat) by giraffedata (guest, #1954) [Link]
I think you are missing the point that this code is often both valid and intentional.
No, I did get that point, and it's orthogonal to my point, which was that the compiler is not optimizing away my code, so need not warn me that it has optimized away my code.
The first point I made in the same post is that I prefer no warning unless there is a way for me to disable it, which is directly derived from your point that the code is often both valid and intentional.
However: I can see your real point is that the code is not only intentional, but is intended to mean something different from what the standard says it means. That, I suppose, means the warning says, "you wrote something that often is intended to say something other than what it really says." I agree that's a valid warning. It's like the warning for "if (a = b)". It's still not a case of the user's code being "optimized away."
After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc.
While quite low level, C is not so low level that integer types wrap around. You're just talking about particular implementations of C. The warning we're talking about warns the programmer that something didn't wrap around as expected.
Is this an useful optimization ??
Posted Apr 20, 2008 19:44 UTC (Sun)
by viro (subscriber, #7872)
[Link] (7 responses)
Posted Apr 20, 2008 19:44 UTC (Sun) by viro (subscriber, #7872) [Link] (7 responses)
a) *unsigned* integers wrap around; it hadn't been promised for signed ones and as the matter of fact it hadn't been true for a bunch of implementations, all way back to late 70s. b) pointer + integer wraparounds are even worse - all sorts of fun things happened in a lot of implementations; e.g. there's nothing to stop the hardware from using upper bits to store extra information of any kind (e.g. ASN) and using an instruction for pointer + int that mangles them instead of a wraparound. With instruction for comparison that would trap on ASN mismatch. And that doesn't take anything too exotic - e.g. 56bit address space, 8bit ASN in upper bits, 32bit int. Get a carry into bit 56, have both > and < trigger a trap. The point is, you are not relying on low-level details - you are assuming that all world is a VAX (OK, x86 these days). And insisting that all implementations reproduce exact details of (undefined) behaviour, no matter how much PITA it would be. BTW, that addition itself can trap on wraparound - and does on real-world architectures. So you can do that validation in a sane way (compare untrusted index with maximal valid value) or, if you want to rely on details on pointer implementation *and* be perverse, play with casting to uintptr_t and using the fact that operations on _that_ are well-defined. Will still break on architectures that do not match your idea of how the pointers are represented, but at least you won't be feeding the data to be validated into operations that might do all sorts of interesting things and trying to guess if those things had happened by the outcome. There's implementation-defined and there's undefined. That kind of wraparounds is firmly in the latter class, regardless of any optimizations. C doesn't try to codify _how_ any particular construct with undefined behaviour will screw you - it doesn't even promise to try and have it act consistently on given target.
Is this an useful optimization ??
Posted Apr 20, 2008 22:39 UTC (Sun)
by mikov (guest, #33179)
[Link] (6 responses)
Posted Apr 20, 2008 22:39 UTC (Sun) by mikov (guest, #33179) [Link] (6 responses)
As you noted yourself, there is a subtle difference between undefined and implementation defined behavior. This particular case is clearly an example of *implementation defined*. It behaves in the same way on 99% of all CPUs today and there is nothing undefined about it. Not only that, but it is not an uncommon idiom and can be useful in a number of situations. How can you call it undefined ??? You speak about architectures with different pointer representations, etc. I have heard that argument before and I do not buy it. Such implementations mostly do not exist, or if they do, are not an important target for Standard C today, let alone GCC. (Emphasis on the word "standard"). Considering the direction where hardware is going, such architectures are even less likely to appear in the future. Instructions that trap for any reason are terribly inconvenient for superscalar or OoO execution. Anyway, I think I may not have been completely clear about my point. I am *not* advocating that the standard should require some particular behavior on integer and pointer overflows. I am however not convinced that it is a good idea to explicitly *prohibit* the behavior which is natural to most computers in existence. As it is now, Standard C is (ahem) somewhat less than useful for writing portable programs (to put it extremely mildly). It is somewhat surprising that the Standard and (apparently) most compiler implementors are also making less useful for *non-portable* applications :-) (I have to admit that years ago I was fascinated with the aesthetic "beauty" of portable C code, or may be I liked the challenge. Then I came to realize what a horrible waste of time it was. Take Java for example - for all its horrible suckiness, it does a few things right - it defines *precisely* the size of the integer types, the behavior on overflow and integer remainder of negative numbers. There is a lot of good in x86 being the universal CPU winner - we can finally forget the idiosyncrasies of the seventies ...)
Is this an useful optimization ??
Posted Apr 21, 2008 0:32 UTC (Mon)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Apr 21, 2008 0:32 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)
Pointer overflow is undefined because the Standard does not define it. That's what undefined *means*. The semantics of some construct are undefined both when the Standard says they are, and when it simply doesn't say anything one way or the other. The semantics of a construct are implementation-defined *only* when the Standard says they are. You can't say `oh, it doesn't say anything about $FOO, therefore it's implementation-defined'.
Is this an useful optimization ??
Posted Apr 21, 2008 1:30 UTC (Mon)
by mikov (guest, #33179)
[Link] (2 responses)
Posted Apr 21, 2008 1:30 UTC (Mon) by mikov (guest, #33179) [Link] (2 responses)
I understand that. (In fact I used to be intimately familiar with the C99 Standard several years back, before I got disillusioned with it - how can you have any respect for a standard containing "atoi()" ? :-) ). I am not discussing the "meaning" of the Standard. The meaning is more than clear, at least in this case, and if you look at my first post in this thread, you will see me explicitly noting that this is a completely valid transformation according to the Standard. However it is also completely valid *not* to to perform this transformation. So, this is clearly a QOI issue and I am discussing it as such. I am not convinced that optimizing away operations which are otherwise completely valid and useful, only because the standard says that it is allowed (but not required!), is a good idea. The Standard is just a document. Fortunately it is not an absolute truth and it does not prevent us from thinking on our own. Plus, it is not particularly inspiring or useful. As you probably know very well, it is impossible to write an useful C application using only the C standard. *Any* C application is non-portable and non-standard compliant. The C Standard is so weak, that it is practically useless, except as mental self pleasuring, much as we are doing now. (BTW, people often delude themselves that they are writing compliant C, when their applications happen to run on a couple of conventional 32-bit architectures, using the same compiler, in a POSIX environment. Ha! Then they get ported to Windows (or the other way around) and it is taken as an absolute proof of compliance. In fact the applications are a mess of special cases, ifdef-s and non-standard assumptions.)
Is this an useful optimization ??
Posted Apr 21, 2008 6:50 UTC (Mon)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Apr 21, 2008 6:50 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)
Ah, right. Well, QoI is in the eye of the beholder: I happen to think that the semantics of pointer overflow are semantics that only a maniac would rely upon intentionally, so the *most* we need is an extension of the existing compiler warning that `comparison is always true' or something along those lines.
Is this an useful optimization ??
Posted Apr 21, 2008 17:32 UTC (Mon)
by mikov (guest, #33179)
[Link]
Posted Apr 21, 2008 17:32 UTC (Mon) by mikov (guest, #33179) [Link]
I was thinking mostly of integers, not pointers. However I am hard pressed to think of an example where a pointer overflow would not safely wrap around _in practice_. Even with x86 FAR pointers, the offset will safely wrap around. There will be no trap and the comparison will still work. Are there many architectures, in use today, where pointers are not integers and do not wrap around on overflow ? Secondly, do these architecture run Linux and are they supported by GCC ? Regardless of the answer to these questions, why would people writing code with zero chance for running on those architectures, be maniacs ? The C Standard is not a measurement for good programming. It simply has a set of restrictions to ensure portability between all kinds of weird architectures (I should say it fails in this miserably, but that is beside the point). However portability is not the only, and by far not the most important measurement of code.
Is this an useful optimization ??
Posted Apr 21, 2008 11:09 UTC (Mon)
by wahern (subscriber, #37304)
[Link] (1 responses)
Posted Apr 21, 2008 11:09 UTC (Mon) by wahern (subscriber, #37304) [Link] (1 responses)
Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted. Relying too heavily on pointer arithmetic in algorithms is not the smartest thing to do. The largest integral type supported on the computer I'm typing on is 64-bit (GCC, long long), but pointers are only 32-bits. Parsing a 5GB ISO Base Media file (aka QuickTime Mov), I can keep track of various information using the unsigned 64-bit integral; if I had written all my algorithms to rely on pointer arithmetic to store or compute offsets, I'd be screwed. C precisely defines the behavior of overflow for unsigned types. Java's primitives suck because they're all signed; the fact that it wraps (because Java effectively stipulates a two's-complement implementation) is useless. In fact, I can't even remember the last time I used (or at least wanted) a signed type, in any language. Having to deal with that extra dimension is a gigantic headache, and it's worth noting that Java is just as susceptible to arithmetic bugs as C. I'd argue more so, because unwarranted reliance on such behavior invites error, and such reliance is easier to justify/excuse in Java because it so narrowly stipulates such things. C's integral types are in some ways superior to many other languages' specifically because they're so loosely defined by the spec. Short of transparently supporting big integers, it forces you to focus more on values than representations. That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code. Using comments is always second to using masks, and other techniques, where the code speaks for itself more clearly than a comment ever could. A better system, of course, would utilize type ranges a la Ada. Anyhow, I know the parent's point had more to do with pointers, but this just all goes to show that good code doesn't rely on underlying representations, but only the explicit logic of the programmer, and the semantics of the language.
Is this an useful optimization ??
Posted Apr 21, 2008 16:00 UTC (Mon)
by mikov (guest, #33179)
[Link]
Posted Apr 21, 2008 16:00 UTC (Mon) by mikov (guest, #33179) [Link]
Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted.
I disagree. A predictable conditional jump is better, simpler and more efficient than a completely unpredictable trap. Even if the trap doesn't have to go through the OS signal handler (which it probably does on most OS-es), it has to save context, etc.
One could argue for adding more convenient instructions for detecting an overflow and making a conditional jump on it. I strongly suspect that instructions that trap are currently a dead end.
Anyway, we know x86 doesn't have instructions that trap on overflow. (Well, except "into", but no C compiler will generate that). Do PPC and ARM have them and are they used often?
That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code.
That is not my experience at all. C99 defines "int_least32_t" and the like exactly to address that problem. Many C programmers like to believe that their code doesn't rely on the size of "int", but they are usually wrong. Most pre-C99 code would break horribly if compiled in a 16-bit environment, or where integer widths are not powers of 2, or are not two's complement.
Honestly, I find the notion that one can write code without knowing how wide the integer types are, in a language which doesn't implicitly handle the overflow (unlike Python, Lisp, etc), to be absurd.
I am 100% with you on the unsigned types, though.
I also agree that in practice Java is as susceptible to arithmetic bugs as C. However it is for a different reason than the one you are implying. It is simply because in practice Java and C have _exactly_ the same integer semantics.
Java simply specifies things which 90% of the C programmers mistakenly take for granted. Wrap-around on overflow, truncating integer division, etc.