Striking gold in binutils
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
A new linker is not generally something that arouses much interest outside of the hardcore development community—or even inside it—unless it provides something especially eye-opening. A newly released linker, called gold has just that kind of feature, though, because it runs up to five times as fast as its competition. For developers who do a lot of compile-link-test cycles, that kind of performance increase can significantly increase their efficiency.
Linking is an integral part of code development, but it can be invisible, as it is often invoked by the compiler. The sidebar accompanying this article is meant for non-developers or those in need of a refresher about linker operation. For those who want to know even more, the author of gold, Ian Lance Taylor, has a twenty-part series about linker internals on his weblog, starting with this entry.
For Linux systems, the GNU Compiler Collection (GCC) has been the workhorse by providing a complete toolchain to build programs in a number of different languages. It uses the ld linker from the binutils collection. With the announcement that gold has been added to binutils, there are now two choices for linking GCC-compiled programs.
A linker overview
For non-developers, a quick overview of the process that turns source code into executable programs may be helpful. Compilers are programs that turn C—or other high-level languages—into object code. Linkers then collect up object code and produce an executable. Usually the linker will not only operate on object code created from a project's source, but will also reference libraries of object code—the C runtime library libc for example. From those objects, the linker creates an executable program that a user can invoke from the command line. The linker allows program code in one file to refer to a code or data object in another file or library. It arranges that those references are usable at run time by substituting an address for the reference to an object. This "links" the two properly in the executable. Things get more complicated when considering shared libraries, where the library code is shared by multiple concurrent executables, but this gives a rough outline of the basics of linker operation.
The intent is for gold to be a complete drop-in replacement for ld—though it is not quite there yet. It is currently lacking support for some command-line options and Linux kernels that are linked with it do not boot, but those things will come. It also currently only supports x86 and x86_64 targets, but for many linker jobs, gold seems to be working well. The speed seems to be very enticing to some developers, with Bryan O'Sullivan saying:
Performance was definitely the goal that Taylor set for gold development. It supports ELF (Executable and Linking Format) objects and runs on UNIX-like operating systems only. Only supporting one object/executable format, along with a fresh start and an explicit performance goal are some of the reasons that gold outperforms ld.
Tom Tromey likes the looks of the code:
Because the implementation is geared for speed, Taylor used techniques that may confuse some. He has some concerns about the maintainability of his implementation:
Overall, it seems to be getting a nice reception by the community, with
O'Sullivan commenting that he is "looking forward to the point where
gold entirely supplants the existing binutils linker. I expect that won't
take too long, once Mozilla and KDE developers find out about the
performance boost.
" Once gold gets to that point, Taylor
is already thinking about concurrent
linking—running compiler and linker at the same time—as
the next big step.
There are two other ongoing projects that are working with the greater GCC ecosystem in interesting ways: quagmire and ggx. Quagmire is an effort to replace the GNU configure and build system—consisting of autoconf, automake, and libtool—with something that depends solely on GNU make. Currently, that system uses various combinations of the shell, m4, and portable makefiles to make the building and installation of programs easy—the famous "./configure; make" command line. The tools were written that way to try and ensure that users did not need to install additional packages to configure and build GNU tools. Quagmire, which has roots in a posting by Taylor recognizes that GNU make is ubiquitous, so basing a system around that makes a great deal of sense.
The ggx project is Anthony Green's step-by-step procedure to create an entire toolchain that can build programs for a processor architecture that he is creating as a thought experiment. The basic idea is to design the instruction set based on the needs of the compiler, in this case GCC, rather than the needs of the hardware designers. He is using GCC's ability to be retargeted for new architectures, along with its simulation capabilities to create a CPU that he can write programs for. As of this writing, he has a "hello world" program working, along with large chunks of the GCC test suite passing. Well worth a look.
(Log in to post comments)
Your teaser is too conservative
Posted Mar 26, 2008 17:37 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]
You write "up to five times as fast", but I'm consistently seeing factors above five. For the program I'm currently living with (build, debug, test, change, build, etc), I get a speedup ratio of 5.9 with gold, and that's with some libraries on an NFS mount. With all files on a local disk, it's even faster.Ian, you're a hero.
A bit premature
Posted Mar 27, 2008 14:18 UTC (Thu) by clugstj (subscriber, #4020) [Link]
Before we declare him a linker god, we need to remember that what he has links x86 ELF (imperfectly). This is a far cry from a replacement for GNU ld which links dozens of processors' code in at least a half dozen different file formats. A 5x speedup is nice, but it's not nearly as amazing when you consider how little of the previous tool's functionality it has.
A bit premature
Posted Mar 27, 2008 14:59 UTC (Thu) by pj (subscriber, #4506) [Link]
Still, ld should be optimized for the common case, which is x86 ELF, and it's clearly not. Times changes, the 'common case' changes, and tools should keep up.
A bit premature
Posted Mar 28, 2008 0:17 UTC (Fri) by giraffedata (guest, #1954) [Link]
Besides, Ian doesn't have to produce an actual replacement for Ld or smite Ld to the ground to be a hero or a linker god. If he can produce a replacement for Ld on x86 and x86_64 with ELF, that's godly all by itself.For a great many people, there is no effective difference between a linker that works on x86 and x86_64 with ELF and a linker that works on dozens of processors and half a dozen file formats.
A bit premature
Posted Mar 28, 2008 16:20 UTC (Fri) by landley (guest, #6789) [Link]
> This is a far cry from a replacement for GNU ld which links dozens of > processors' code in at least a half dozen different file formats. 1) Isn't the unix way "do one thing and do it well"? 2) Shouldn't Linux Weekly News be most interested in the tools and formats used by _Linux_? (Is it still interesting to have an ld mode to produce a.out code? I tried to produce a static a.out binary with gcc 4.1.2 and couldn't figure out how in the first hour of trying, and google didn't pull anything up either. How interesting is the ability to produce other binary formats last used by discontinued Hewlett Packard minicomputers from 1986?) 3) The way "binflat" files is created is to make an ELF file, then have a second tool produce a second file from the ELF file. Same for the kernel producing zImage files from the ELF format vmlinux. Much of what GNU ld is doing may not actually actually a good idea...
A bit premature
Posted Mar 28, 2008 22:43 UTC (Fri) by nix (subscriber, #2304) [Link]
GNU ld *does* only one thing: it links. It doesn't actually know much about object file formats (as you doubtless know, that job is left to libbfd). ld's real problem is *age*: its design predates ELF, and it shows. Its design meshes quite well with COFF, but who uses that naymore? (And I use the ihex ld target on a fairly regular basis. Maybe there are other ways to do the same thing, but it works for me...)
Who uses COFF?
Posted Apr 5, 2008 14:22 UTC (Sat) by anton (subscriber, #25547) [Link]
Last I looked, Windows and Tru64 Unix (possibly also other proprietary Unices).
A bit premature
Posted Mar 30, 2008 21:40 UTC (Sun) by AJWM (guest, #15888) [Link]
>> This is a far cry from a replacement for GNU ld which links dozens of >> processors' code in at least a half dozen different file formats. >1) Isn't the unix way "do one thing and do it well"? Yes, but remember, GNU's Not Unix. While it's often an improvement, it does have some idiosyncrasies that drive me crazy (like the insistance on having man pages that basically say 'see the info document', for example).
A bit premature
Posted Mar 31, 2008 0:12 UTC (Mon) by nix (subscriber, #2304) [Link]
Er, most GNU software has had better manpages than that for many years (derived automatically from the texinfo, just as the info is).
Striking gold in binutils
Posted Mar 26, 2008 17:37 UTC (Wed) by joey (guest, #328) [Link]
I feel for the people developing software where a 5x linker speedup is valuable. Really... I've watched your code build natively on some slow arm machines where linking took 20 hours. Will this really be a big win for most of us? I hope not. Things that take significant time to link tend to be big messes. Now automake and libtool -- that's slow for all of us..
Striking gold in binutils
Posted Mar 26, 2008 17:53 UTC (Wed) by elanthis (guest, #6227) [Link]
C++ programs tend to generate a huge number of symbols. Things like Firefox and OpenOffice.org are pure hell to build. There's also the issue of simply large programs written in other languages, or collections of programs. Shaving 2 seconds off link time may not seem like much, but when you're compiling several dozen executables in one go, that 2 seconds adds up really quick.
Striking gold in binutils
Posted Mar 26, 2008 19:18 UTC (Wed) by mto (guest, #24123) [Link]
Won't this, in the long run, affect load times for applications? dynamically linked applications need to be linked twice: once right after compile, and a second time at run time. If gold could eventually replace ld.so and run 5 times faster, the impact is pretty big...
Striking gold in binutils
Posted Mar 26, 2008 20:44 UTC (Wed) by nix (subscriber, #2304) [Link]
It can't replace ld.so. The dynamic and static linkers have quite different (although related) jobs, and have very different goals as well (e.g. performance is utterly paramount for ld.so but less so for ld). The only system which has even *tried* to merge the two is AIX, and I think even it gave up in the end.
Striking gold in binutils
Posted Mar 26, 2008 19:58 UTC (Wed) by nix (subscriber, #2304) [Link]
It's not just 'a huge number of symbols'; it's 'a huge number of symbols with very long names differing only in their last few characters'. This also proved to be a worst-case for a lot of dynamic linkers...
Striking gold in binutils
Posted Mar 27, 2008 7:17 UTC (Thu) by mjthayer (guest, #39183) [Link]
Is this really the case? I once tried hacking up ld.so to do the lookup backwards (it is actually possible without doing a strlen for every comparison) and I could see no difference in performance, based on loading OpenOffice with both linkers and enabling the built-in linker profiling. Of course, I may have messed up something else in the process...
ld.so is different beast
Posted Mar 27, 2008 9:43 UTC (Thu) by khim (subscriber, #9252) [Link]
You can read about what goes on there in Drepper's article. Scroll down to "The GNU-style hash table".
ld.so is different beast
Posted Mar 27, 2008 10:16 UTC (Thu) by mjthayer (guest, #39183) [Link]
That article was the reason I tried it in the first place :)
Striking gold in binutils
Posted Mar 27, 2008 10:46 UTC (Thu) by nix (subscriber, #2304) [Link]
Hm, interesting. I'll try it at some point (probably with part of KDE: OOo takes too damn long to build ;} ) and see if I can make it go slow ;}
Striking gold in binutils
Posted Mar 27, 2008 11:05 UTC (Thu) by mjthayer (guest, #39183) [Link]
No need to rebuild anything to try out a new dynamic linker, methinks...
Striking gold in binutils
Posted Mar 28, 2008 21:27 UTC (Fri) by nix (subscriber, #2304) [Link]
I need to rebuild it to add back a non DT_GNU_HASH :)
Striking gold in binutils
Posted Mar 28, 2008 16:26 UTC (Fri) by landley (guest, #6789) [Link]
A friend of mine who still bothers with C++ explained to me once how C++ compilers used to use a linker optimization where the name mangling would put the innermost identifiers first, and the outermost identifiers last. That way if you had these two symbols: class1.class2.class3.class4.member1 class1.class2.class3.class4.member2 By comparing "member1" vs "member2" first, your string match would figure out inequality faster. If you go the other way, your string matches have to go through lots of common namespace for every symbol before coming to the unique parts, and with BigLongMixedCaseNames this can get fairly ridiculous. Now, which way did the Intel Itanium C++ spec specify that name mangling had to occur? The long way that links slowly, of course. And everybody else picked up the Itanium C++ spec because nobody else bothered to write up a standard for this part of the language.
Striking gold in binutils
Posted Mar 26, 2008 19:35 UTC (Wed) by aleXXX (subscriber, #2742) [Link]
> Will this really be a big win for most of us? I hope not. Things that > take significant time to link tend to be big messes. Hmm, if you have applications which link to a big number of libraries with many symbols (e.g. KDE) linking does take quite long, so improving this may be really nice. I'll give it a try. > Now automake and libtool -- that's slow for all of us.. Just use cmake, it does it all and also gets rid of libtool :-) Alex
Striking gold in binutils
Posted Mar 26, 2008 17:45 UTC (Wed) by quotemstr (subscriber, #45331) [Link]
Why was ld so slow in the first place? A speedup of five times, IMHO, indicates that the fundamental algorithm used by GNU ld was wrong, not that the same algorithm is implemented better in gold.
Striking gold in binutils
Posted Mar 26, 2008 17:54 UTC (Wed) by nix (subscriber, #2304) [Link]
Ian talks about this in his linker tutorial, but, yes, basically GNU ld works inside out and upside down ;} it wasn't originally designed for ELF and it shows.
Algorithm Complexity
Posted Mar 29, 2008 8:16 UTC (Sat) by pkolloch (subscriber, #21709) [Link]
That's funny. For me a factor of five actually suggests that the implementation quality has substantially increased but not the algorithm. If you had a different algorithmic complexity, you should be able to find larger examples with a larger factor. Maybe that's the case but the "factor five" statement would not tell you that.
More on Linkers and Loaders.
Posted Mar 26, 2008 17:48 UTC (Wed) by dfarning (guest, #24102) [Link]
I am providing a link to the online copy of John Levine's book, Linkers and Loaders. It provides good background information on the topic. http://www.iecc.com/linker/ Dave
More on Linkers and Loaders.
Posted Mar 26, 2008 18:28 UTC (Wed) by deweerdt (subscriber, #18159) [Link]
Ian's linkers series is worth a read too: http://www.airs.com/blog/archives/38
Blogs not useful for documenting complex information!
Posted Mar 26, 2008 17:56 UTC (Wed) by cruff (subscriber, #7201) [Link]
I followed the two links to blog entries about gold and ggx with interest, but ran head on into a brick wall where it appears to be impossible to quickly read just the links about each project with out wading through the rest of the irrelevant blog entries. Are there any links to pages that consolidate the relevant project info?
Blogs not useful for documenting complex information!
Posted Mar 26, 2008 18:24 UTC (Wed) by jake (editor, #205) [Link]
> Are there any links to pages that consolidate the relevant project info? The first ggx link is a summary page that refers to all the blog entries. I haven't found the equivalent for Ian's linker series. I certainly agree that the blog format is very bad for trying to follow along. jake
Blogs not useful for documenting complex information!
Posted Mar 26, 2008 19:59 UTC (Wed) by nix (subscriber, #2304) [Link]
That's because the whole of Ian's linker series is almost-never-documented gold dust. Read it all, you know you want to...
Blogs not useful for documenting complex information!
Posted Mar 27, 2008 0:41 UTC (Thu) by Ringding (guest, #34316) [Link]
In Ians' case it's quite easy to access the whole series because the integer numbers at the end of the URLs are consecutive.
C++????
Posted Mar 26, 2008 18:38 UTC (Wed) by sylware (guest, #35259) [Link]
gold is C++... C++ for a core toolchain program?? Well... -->trash.
C++????
Posted Mar 26, 2008 18:54 UTC (Wed) by ncm (guest, #165) [Link]
You seem to be confusing C++ with Java.
whatever
Posted Mar 26, 2008 19:00 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]
Keep your 5x-slower linker then; it's C after all.
whatever
Posted Mar 26, 2008 20:04 UTC (Wed) by nix (subscriber, #2304) [Link]
sylware seems to have something against all languages other than C (and perhaps some flavour of assembler as well). I wonder if (s)he knows that most large C programs (including libbfd) are in many ways object-oriented...
whatever
Posted Mar 26, 2008 20:28 UTC (Wed) by elanthis (guest, #6227) [Link]
A lot of "old beards" dislike C++ for a variety of (mostly) no longer accurate reasons. Keep in mind I'm a huge proponent of C++ and use it for nearly all of my non-Web projects. It generally isn't about object-oriented programming, but more about low-level technical or political issues that were once an actual problem. C++ fractured a lot during its pre-standardization days. This led to a lot of issues with porting code between different compilers. Even after standardization, it's taken a long while for certain vendors (*cough* Microsoft) to catch up and play nice, and there are still a few corner cases where certain compilers don't meet the spec. This really isn't a real concern anymore though. C++ had a non-standard ABI on early Linux systems, leading to constant breakage of C++ programs. Upgrading one's system would often lead to many or sometimes even all C++ programs no longer working. This hasn't happened in many, many years, and is unlikely to ever happen again given that there is a standardized C++ ABI that all major compiler vendors follow. C++ suffers from a massively huge specification that no one compiler implements bug-free, and compiler updates often include standard-compliance fixes that break old non-compliant (but perfectly functional) code. Upgrading your compiler can sometimes result in old code no longer compiling. This is still true today, unfortunately - the GCC 4.3 release has broken a lot of software, and even though the fixes are generally extremely simple to make, they're still annoying. This is only an issue if you aren't 100% sure you're writing standard-compliant code; unfortunately, few of us can be, given how big that spec is. Valid fear. C++ can include some (negligible) performance regressions over C when using certain C++ features. However, these features really have no analog in C at all, and the few attempts at providing similar features in C often reason is horrifically ugly and difficult to use APIs or result in even worse performance than C++'s implementation. These features include exceptions and RTTI. Both of these features can be turned off in almost every compiler, so apps which don't really need them do not need to take the performance hit and can still take advantage of C++'s other features. C++ makes it easier to write bad code that masquerades as good code. That is, C++ allows you to hide what your code is doing behind operator overloading, function/method overloading, virtual vs non-virtual method calls, and so on. C would require the coder to be explicit about what he is doing, and looking at the code makes it obvious what is going on without having to look up every header to figure out which methods are virtual or which global operators or overloaded functions have been defined. Basically, this boils down to there being a lot of shitty programmers, C being less friendly to shitty programmers, and more of those shitty programmers flocking to C++ than to C. Not an issue when you're dealing with someone who is truly good at software engineering, like the gold author. C++ does nothing that C can't do. Anything you can write in C++ you can write in C, and make it just as efficient, possibly even more so in some circumstances. This argument is mostly bogus because it overlooks the fact that writing most of those things in C is way, way harder than writing them in C++ (getting the performance of templatized STL container classes out of C without manually maintaining a metric crap load of almost identical code is not really feasible). It's also bogus because, for those few things where doing it "the C way" or eschewing the standard "C++ way" (like using a custom container for a specific data set that allows for very specific optimization techniques that the STL can't use), C++ is entirely capable of doing it "the C way." Still, it's pretty common to see old beards claim C++ is pointless or meritless, even if it is easily refuted. C++ has additional runtime requirements that make it less suitable for low-level tools and libraries. A C program can get by with just libc. A C++ program needs some extra libraries, for things like iostreams, the new/delete operators, exception handling, and so on. As with exceptions in general, these features may be done without to avoid the additional dependencies, although the c++ compiler will still link them in anyway by default. These dependencies really don't have any true negative impact (not like some of the big GUI apps, C and C++ alike, that require dozens of shared libraries to run... hello GTK/GNOME), but there is a fear among some of having any kind of dependency that isn't 100% mandatory. If the dependency really is an issue it is avoidable without giving up C++'s other features, and the fear is pretty pointless any more, so this reason is bunk. So, there a few valid reasons to possibly avoid C++ for low-level work. Some of those can be worked around with a little effort. In general, given that complete OS kernels have been written in C++ with no ill effect, I think it's safe to say that this kind of fear of C++ in 2008 is greatly unjustified. Old beards aren't likely to change their tune now any more than they were a decade ago, though. :)
+5, Insightful
Posted Mar 26, 2008 20:48 UTC (Wed) by nix (subscriber, #2304) [Link]
I think I agree with everything you said there, but you said it *very* well.
+5, Insightful
Posted Mar 26, 2008 21:08 UTC (Wed) by JoeF (guest, #4486) [Link]
Ditto. I use C++ a lot, for my day job, and I agree with everything elanthis said. Before my current project, I had to maintain a complex program written almost completely in C. That turned out to be a nightmare, and it actually was fairly well-written code. In my current project, I make use of a lot of STL functionality. Implementing that in C would be hell...
yet another +5
Posted Mar 26, 2008 22:14 UTC (Wed) by pr1268 (subscriber, #24648) [Link]
Another C++ fanboy here who really like Elanthis' comments.
I write lots of personal utility programs in C, and yes, the C code is usually leaner and more spry than equivalent C++ code. But, it's hard to ignore just how well C++ satisfies virtually every programming paradigm in existence. For me, trying some new paradigm or language feature set in C++ is usually a fruitful academic exercise where I come away so much more enlightened.
I wrote a C++ program recently that had all kinds of libc calls (stat, mkdir, opendir, and readdir, to name a few) sitting next to standard library vectors and fstreams. It might have looked like a Frankenstein of a program suffering from a source language identity crisis, but I'll be damned if it didn't run perfectly and efficiently (using existing GCC and binutils). And this "hybrid" C/C++ program ran several orders of magnitude faster than my hand-coded linked list version it replaced (but let's not go to why my linked list version sucked - suffice it to say that code was a brown paper bag experience). ;-)
I do understand Sylware's sentiments towards C++ (even if I don't agree with them) - after all, Linus feels the same way. It's all about using the right tool for the right job. With C, you get a well-stocked hand-carry tool box. For many folks, that's perfectly adequate. With C++, you get the entire Sears Craftsman catalog.
I'm actively looking forward to seeing more of Gold and its performance gains. Kudos to the developers.
yet another +5
Posted Mar 27, 2008 4:15 UTC (Thu) by wahern (subscriber, #37304) [Link]
If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#? I agree that C++ is in some respects a "better C". But I don't want a "better C". I like C because its simple, portable, and the lowest common denominator. I can compile most of my own libraries and applications on Linux, *BSD, and Win32 w/o so much as changing a compiler option. If I want a complex language, there are better ones than C++; the cost/benefit tradeoff is superior. When C++ gets garbage collection and closures, call me. (And Boehm better not be on the line.) If I have to put up w/ the extra baggage (language complexity, superfluous libraries, porting headaches), I demand more bang for my buck.
yet another +5
Posted Mar 27, 2008 7:09 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#?
I can't find in my earlier post where I said I didn't use Java. In fact, I do from time to time. My earlier post was about why I'm such a fan of C++ - and how I can still enjoy all the benefits of both C and C++, even in the same program.
However, I take exception to your implication that Java provides competitive "sugar" to C++. Consider the following example of reading an integer from a file:
C++
int x; ifstream in_file; in_file.open("foo.txt"); assert(in_file); in_file >> x;
Java
int x; String line; try { FileReader fr = new FileReader("foo.txt"); BufferedReader br = new BufferedReader(fr); line = br.readLine(); x = parseInt(line); } catch (Exception e) { }
Not more sugar, IMO. Yes, I do realize that Java has to abstract a lot of file I/O due to the fact that it supports multi-byte character sets on dozens of architectures with different byte orders and file systems, thus explaining the syntactic "salt". But, still, even the C version is pretty simple:
C
int x; FILE* in_file; in_file = fopen("foo.txt", "r"); assert(in_file); fscanf(in_file, "%d", &x);
But again, I'm not trying to dismiss Java, only to provide a counter example of where Java fails to provide any programming benefit over C++. Actually, the C and C++ examples above would likely only work on ASCII or UTF-8 filesystems, but Java's UTF-16 support is native to the language1. So, Java gets to tell C and C++ what "portability" means (even if the programmer has to dig through 67 layers of abstraction to accomplish what he/she set out to do). ;-)
As for C#, well, despite my own thoughts about Microsoft getting in the way, I just don't see why C# even needs to exist. Microsoft was actively marketing Java development suites and compilers in the mid- and late-1990s (Visual J++, anyone?), adding their own APIs and language features, until Sun Microsystems had the guts to stand up to MS and tell them to stop violating the terms of Sun's license (with a successful lawsuit). It was all sour grapes for MS afterwards, so they just had to go run out and create their own Java imitation. C'mon, Microsoft, you already had legions of Visual Basic programmers! Why go out and create a whole new language when there are so many already out there? Was it because Visual Basic wasn't "Java-like" enough?
Not that I think C# is a bad language; it does have some interesting features. But, I get this strange feeling that C# skills will be useless come five years from now, just like Visual Basic skills are in much less demand than they were five years ago.
As for coding C#, well, I abandoned MS Windows four years ago for a single-boot Linux. And the only native C# compiler for Linux I know of is Mono. Which causes anomalous behavior on my Slackware machines. Come to find out Mono has library dependency issues with my current toolchain and dynamic linker. In other words, I experienced DLL HELL in Linux2, simply by installing Mono (version 1.24) in Slackware 12. How ironic this is considering I'm trying to write Windows applications in Linux. I suppose the gratuitous DLL hell is all part of the Microsoft "experience" I'm supposed to get whilst writing C# code. No thank you.
I mostly agree with the rest of your post - I do indeed like C's portability (hey, it was one of the core motivations for creating the language to begin with), and I also like its efficiency. I can't say that adding garbage collection to C++ would be worthwhile; Even Bjarne Stroustrup labored over whether to include garbage collection in C++ back in the early 1980s.
My personal thoughts are that garbage collection built-in to the programming language would send several mixed messages to programmers using that language:
- "I don't have to worry about freeing up allocated memory, since the GC will do it for me. This activity doesn't slow down my program, does it?"
- "Since I don't have to worry about keeping track of run-time allocations (why bother? The GC will clean up my mess for me!), I can indulge my code by generously overestimating allocation space. Who cares about the mess to be cleaned up after the party? That's what the GC is for."
- "I've been writing frugal, tidy, and efficient code for years by judiciously managing my run-time allocations. How come all of the sudden I don't have control over this anymore? What if I don't agree with the GC strategy? Can't I disable it completely and manage my own memory usage? After all, I'm quite good at it."
(And Boehm better not be on the line.)
I LOLed at your comment. I downloaded a source tarball several months ago (I forget which program/application) that had a Boehm GC dependency. I thought to myself, C++ doesn't have garbage collection for a reason, but why does this particular project feel that it needs to slather a layer of protection over the code? Are the programmers lazy? Or, do they not know how to use new and delete?
When C++ gets garbage collection and closures, call me.
Perhaps some of your garbage collection needs could be met by using the C++ standard library container classes (e.g., vector, list, deque, etc.). Or, more appropriately stated, your need for a GC to begin with could be eliminated. But, perhaps that's a discussion for another time. I've rambled on long enough, it's been fun pontificating and bloviating (to quote one of my graduate professors). :-)
1 C++ does have explicit support for multi-byte character sets with its wchar_t type. I don't know what kind of support C has in this regards, or if it supports multi-byte characters at all.
2 I'll openly admit that Microsoft unfairly receives the brunt of user frustration over DLL hell when in reality the basic concept of library dependency hell is a Unix creation which predates DLL files by several years.
yet another +5
Posted Mar 27, 2008 10:54 UTC (Thu) by nix (subscriber, #2304) [Link]
GC support in the language has one major advantage over not having it: if the compiler and GC layer cooperate, the language can do type-accurate garbage collection. That's pretty much impossible with a 'guess if this bit pattern is a pointer' implementation like Boehm. (But still, why GC in a C/C++ program? Easy: sometimes, the lifespan of allocated regions is complex enough that you don't want to track it in your own code. A lot of large C/C++ systems have garbage collectors in them, often hand-rolled. GCC does, for instance, and while its effect on data locality slowed GCC down a lot, it *also* wiped out huge piles of otherwise-intractable bugs. In my own coding I find that Apache-style mempools and disciplined use of ADTs eliminates most of the need for GC while retaining the nice object-lifecycle benefits of C/C++, so I can use RAII without fear. Losing that pattern in Java is a major reason why I try to avoid the language: in effect Java eliminates memory leaks only to replace them with other sorts of resource leak because you can't use RAII to clean them up for you...)
Python
Posted Mar 27, 2008 12:43 UTC (Thu) by ernstp (guest, #13694) [Link]
Sorry, completely off topic, I just had to post this. Python: int( file("foo.txt").read() ) :-P
Python
Posted Mar 27, 2008 13:21 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
Show-off!
You forgot to catch the exception of the file not opening. Where's your deadParrot() error-handling function?
;-)
Python
Posted Mar 27, 2008 17:46 UTC (Thu) by cwarner (guest, #47176) [Link]
How far we've come.. how far.
Ruby
Posted Mar 28, 2008 1:15 UTC (Fri) by Tuxie (guest, #47191) [Link]
sorry, I had to :-) x = File.read("foo.txt").to_i rescue deadParrot
Ruby
Posted Mar 28, 2008 16:08 UTC (Fri) by alkandratsenka (guest, #50390) [Link]
Reading whole file in memory just to parse int from it's first line is very funny :) You'll need a longer version like this (File.open('foo.txt') {|f| f.readline}).to_i rescue deadParrot
c++ vs c
Posted Mar 27, 2008 14:51 UTC (Thu) by jimparis (guest, #38647) [Link]
> C++ > > int x; > ifstream in_file; > in_file.open("foo.txt"); > assert(in_file); > in_file >> x; > C > > int x; > FILE* in_file; > in_file = fopen("foo.txt", "r"); > assert(in_file); > fscanf(in_file, "%d", &x) Here's something that really bugs me about C++. Where's the documentation? With C, "man fopen" "man assert" "man fscanf" gives me all the info I need. With C++, I suppose some manual page for ifstream would be most appropriate, but I don't seem to have it. Which package is that in? Or must I resort to google searches every time? Of course, even if I did have C++ manpages, deciphering "in_file >> x" still requires that I track backwards to figure out the types of "in_file" and/or "x" (yay operator overloading!)
c++ vs c
Posted Mar 27, 2008 15:13 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
I suppose some manual page for ifstream would be most appropriate, but I don't seem to have it.
All Glibc standard library functions have man pages (I'm unsure whether these came before or after the shell functions' man pages). I think this might be related to the founding philosophy that C is supposed to be portable, and the man pages were a convenient way of distributing documentation on the system call interfaces without having to decipher C code you've never seen before (not impossible, but time-consuming).
I can't recall ever seeing a C++ man page, but then again, the whole language standard was in limbo up until its 1998 ISO standardization. Not sure why they don't exist nowadays, but perhaps Stroustrup would prefer that you buy his book instead (stupid conspiracy theory). Some of the top links in Google searches for various C++ functions and standard library classes are quite decent (IMO).
Personally, I recommend anyone trying to "dive into" C++ go find a used C++ textbook. Just be sure to get one dated more recent than 1998 (because older C++ texts are rife with code that predates the ISO standard).
c++ man pages on gcc.gnu.org
Posted Mar 27, 2008 17:01 UTC (Thu) by bkoz (guest, #4027) [Link]
See: http://gcc-ca.internet.bs/libstdc++/doxygen/ I believe some os vendors (debian, I think) package these. -benjamin
c++ vs c
Posted Mar 28, 2008 12:00 UTC (Fri) by cortana (subscriber, #24596) [Link]
Indeed, man pages are not really suitable for C++ (and many other languages) for the reasons you state.
If you are on a Debian system, run: apt-cache -n search libstdc++ doc and install one of those packages. Then check out its directory in /usr/share/doc.
The docs are also online at http://gcc.gnu.org/onlinedocs/libstdc++/.
A very nice quick reference to iostreams and the STL can be found at http://cppreference.com/.
I have to say I don't really prefer the man pages for C development because often they contain oudated or just plain incorrect information. I prefer to use the glibc manual directly for reference.
Use of assert
Posted Mar 30, 2008 4:53 UTC (Sun) by pjm (guest, #2080) [Link]
Incidentally, please don't use or encourage use of assert for checking for I/O errors or other can-happen runtime conditions. Such checks will disappear when compiled with -DNDEBUG (or conversely make it impractical to compile with -DNDEBUG, thus discouraging use of assertions), and fail to give a meaningful error message. That should be if (!in_file) { perror("foo.txt"); exit(EXIT_FAILURE); }.
Use of assert
Posted Mar 31, 2008 21:05 UTC (Mon) by pr1268 (subscriber, #24648) [Link]
Well, I had to level the playing field somewhat since Java forces me to put all that code in a try/catch block...
But yeah, I generally do what you recommend in C/C++.
The metric is SPEED not just easier development
Posted Mar 27, 2008 10:03 UTC (Thu) by khim (subscriber, #9252) [Link]
If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#?
Java and C# are using virtual machines and thus are slower. End of story. C is closer to the metal, but suffers from human problem: it's not feasible to generate 10'000 specialization by hand. You need some metaprogramming. If you'll take a look on really fast "C libraries" (like FFTW or ATLAS) you'll find out that while they include bunch of .c files these .c files are not the source! They itself are generated by some automatic process. C++ allows you to do something similar without using yet-another-specialized system (STL and especially boost are big help, but simple template metaprogramming works as well in simple cases). Thus in practice C++ programs written by good programmers are faster then C programs (if you turn of rtti and exceptions, of course). AFAICS this was reason for C++ usage in gold, too.
Of course it's very easy to misuse C++, too...
The metric is SPEED not just easier development
Posted Mar 27, 2008 10:56 UTC (Thu) by nix (subscriber, #2304) [Link]
One point: RTTI and exception handling don't slow down C++ programs anymore, except if dynamic_cast<> is used or exceptions are thrown, and those are things which if you implemented them yourself you'd have a lot of trouble making as efficient as the compiler's implementation (I doubt that you *can* make them as efficient or reliable without compiler support).
Since WHEN?
Posted Mar 28, 2008 9:26 UTC (Fri) by khim (subscriber, #9252) [Link]
Last time we've checked (GCC 4.1.x) removal -fnortti and/or -fnoexceptions made real world programs 5-10% slower (up to 15% combined). What change happened in GCC 4.2 and/or GCC 4.3???
If you DO need RTTI and/or exceptions of course it's better to use compiler-provided ones, then to write your own, but if not... For things like gold abort() is perfectly usable alternative to the exceptions...
Since WHEN?
Posted Mar 28, 2008 21:26 UTC (Fri) by nix (subscriber, #2304) [Link]
I think I need to profile this, then, because exception frames should be very nearly free to set up and (non-throw) tear down, certainly not as expensive as 15%. This wasn't on an sjlj target, was it? 'cos they're *so* last millennium.
The metric is SPEED not just easier development
Posted Apr 2, 2008 11:12 UTC (Wed) by dvdeug (subscriber, #10998) [Link]
Java the programming language doesn't use a virtual machine any more than C does. It happens to usually be implemented using a virtual machine, but there are native compilers, like gcj. Furthermore, the coding matters a lot more than the language, and the language can frequently simplify the coding.
+5, Insightful
Posted Mar 28, 2008 0:56 UTC (Fri) by man_ls (guest, #15091) [Link]
Another +5 here. Only a small detail bothers me:C being less friendly to shitty programmersHaving seen lots of horrible C code I think that shitty programmers feel as confident obfuscating C, C++ or Java code. Just the liberal use of global variables and gotos can get as bad as the worst class hierarchy.
You missed the point
Posted Mar 26, 2008 22:41 UTC (Wed) by sylware (guest, #35259) [Link]
Anything that makes dependent a system tool on more than the complexity of a C compiler should be trashed, period. Why? This is my limit for containement of the size/complexity of the system software stack. I won't go any further. As you perfectly put forward, a C++ compiler, even a non optimizing one, is hell on earth to code compared to a C compiler. A linker as a system C++ program would damage the size/complexity of the system software stack. My conclusion is horribly simple:gold has to go to trash and its coder should work on speeding the properly (namely C) coded ld (oprofile?).
What a narrow-minded viewpoint!
Posted Mar 26, 2008 22:55 UTC (Wed) by felixfix (subscriber, #242) [Link]
I don't have a list, but I am sure you use tools every day that were developed in some language other than C. Perl and Python come to mind, but at any rate, restricting your tool chain to C-based code is not possible nowadays. It isn't just narrow-minded to want that, it is burying your head in the sand to pretend it is possible.
You missed the point
Posted Mar 27, 2008 0:47 UTC (Thu) by ncm (guest, #165) [Link]
If sylware thinks he understands Gcc's C compiler, it can only be because he hasn't looked at it in a long, long time.
You missed the point
Posted Mar 27, 2008 0:49 UTC (Thu) by epa (subscriber, #39769) [Link]
gcc has an extremely complex codebase. Your argument would seem to suggest replacing it with a simpler C compiler such as pcc, in order to reduce the total complexity of the 'system software stack'. Similarly you should be running Minix or another kernel that is simple enough one person can read and understand all the code. And I assume you have no truck with the horribly baroque autoconf/automake/libtool rat's nest. From what I've read, gold is much simpler than the overly-general ld implementation it substitutes for. Of course, part of this simplicity is because it is written in a higher-level language. Often this is a worthwhile tradeoff - after all the compiler only has to be written and debugged once. Were this not the case, all programs would be in assembly.
You missed the point
Posted Mar 27, 2008 2:12 UTC (Thu) by elanthis (guest, #6227) [Link]
Your argument makes no sense. The C++ compiler stack is part of GCC, and the C++ portions of the compiler make up such a small percentage of the complexity of the rest of the stack as to be not worthy of mentioning. Plus, I'm fairly sure (might be wrong) that the current versions of GCC have merged the C and C++ parsers. The complexity of C++ does not mean that you get huge, unwieldly compilers. It just means that you have trouble implementing the spec 100% accurately. It's no different some a protocol like HTTP. HTTP (esp v1.1) is actually pretty hard to get fully implemented correctly. A lot of HTTP servers get it wrong, as do a lot of clients, and thus there are certain combinations of HTTP server and client that just don't work together. Despite this, a fully correct HTTP server or client implementation is still pretty short, sweet, and easy to read. HTTP doesn't force ugly code, it's just not as simple a protocol as one might think it is. You can think of C++ the same way. It doesn't require that much extra effort on top of C to write a compiler for, but it trying to make sure you cover 100% of the spec is harder than you might think given how very little C++ adds to the C language.
You missed the point too
Posted Mar 27, 2008 2:31 UTC (Thu) by sylware (guest, #35259) [Link]
If I need to rewrite from scratch a non optimizing C compiler it's easy. Look at tcc, and I have plenty of students who had the "small and simple" project of writing a C compiler. Of course, when we bring C++ on the table, you hear "insane", "crazy", "far to complex" etc... I was refering to *that* complexity, not the current complexity of the best utra-super optimizing compiler which is gcc.
You missed the point too
Posted Mar 27, 2008 11:04 UTC (Thu) by nix (subscriber, #2304) [Link]
But a random C compiler reimplementation isn't capable of compiling most of the other parts of the stack in any case. GCC can be compiled with just about anything that supports ISO C, but you'll need to reproduce a lot of GCC's (largely undocumented) foibles and language extensions before you can compile, say, the Linux kernel with it. I don't really see why the complexity of the *compiler* is relevant anyway. It's not as if GCC is going to abruptly go away or stop working, so its complexity doesn't negatively impact you at all.
You missed the point too
Posted Mar 28, 2008 17:20 UTC (Fri) by landley (guest, #6789) [Link]
Actually, I'm working on making tinycc (a project derived from tcc) compile the rest of the stack, including the kernel. I have rather a lot of work left to do, of course. :) http://landley.net/code/tinycc I find gold interesting, but not useful in my case because tinycc's linker is built-in. (I'm reorganizing the code to work as a "swiss army knife" executable ala busybox, but that's not in -pre2. Maybe -pre3.) As for the kernel, the linux-tiny project is working on getting that more modular so we need to select less of it... Rob
You missed the point
Posted Mar 27, 2008 19:17 UTC (Thu) by tjc (guest, #137) [Link]
> I'm fairly sure (might be wrong) that the current versions > of GCC have merged the C and C++ parsers. The C parser was rewritten in gcc 4.1, and I *think* its still separate from the C++ parser.
You missed the point
Posted Mar 28, 2008 22:25 UTC (Fri) by nix (subscriber, #2304) [Link]
Yes. It's not separate from the *Objective C* parser. (The similarity is that, like the C++ parser, the C parser has now made the transition from bison to a hand-rolled parser.)
You missed the point
Posted Mar 27, 2008 4:59 UTC (Thu) by artem (subscriber, #51262) [Link]
Well then you should really stay away from the programming for some time already. Guess what the language is used for the tools used in designing computer chips? (for reference, see http://www.research.att.com/~bs/applications.html, scroll down to the bullet labeled 'Intel')
You missed the point
Posted Mar 27, 2008 5:50 UTC (Thu) by lysse (guest, #3190) [Link]
colorForth? ;)
You missed the point
Posted Mar 27, 2008 7:37 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
Google has written a memory allocator library (to compete with the Glibc 2.3 equivalent, ptmalloc2), in C++.
Now, my understanding of the memory allocator is that this is a library whose run-time efficiency should be unquestioned. This is code that interfaces with the kernel nearly continuously. Accordingly, C++ would not have been my first choice of programming language in which to implement this (I would have chosen C, but don't mind me--I've never written a memory allocator before!).
But, Google's allocator library appears to have improved performance over the incumbent Glibc ptmalloc2 in certain scenarios, according to the graphs near the bottom of that page. And to think this was accomplished with C++ (I'm assuming that the Glibc ptmalloc2 is written in C, but I do ask someone to correct me if I'm wrong).
You missed the point
Posted Mar 27, 2008 11:05 UTC (Thu) by nix (subscriber, #2304) [Link]
Actually the memory allocator largely interfaces with itself and its userspace callers. Its interface with the kernel is restricted to the occasional sbrk() and mmap() calls.
You missed the point
Posted Mar 28, 2008 3:45 UTC (Fri) by pflugstad (subscriber, #224) [Link]
Also, TCMalloc doesn't return memory to the kernel at all, while GNU libc's does.
You missed the point
Posted Mar 27, 2008 5:49 UTC (Thu) by lysse (guest, #3190) [Link]
How dare you be this dismissive of *anyone's* work without an alternative to offer? Especially on that flimsiest of pretexts, ideology? You want something done *your* way, do it yourself. Otherwise, take what you're offered. Telling someone that they have to junk what they've done is bad enough when they're only as far as having it working; when they're handily trouncing what they aim to replace, telling them that their replacement isn't "good enough" - because their choice of implementation language doesn't satisfy *your aesthetic tastes* - only exposes *you* as the fool you are. (Unfortunately, yours is the voice of the majority, and humanity is doomed.)
Being dismissive of another's work
Posted Mar 28, 2008 0:02 UTC (Fri) by giraffedata (guest, #1954) [Link]
How dare you be this dismissive of *anyone's* work without an alternative to offer?
The way I read it, sylware did offer an alternative: classic binutils 'ld'. He says even as slow as it is, it's better because of the C++ issue.
You want something done *your* way, do it yourself. Otherwise, take what you're offered.
Surely you don't mean that. We all take a third option all the time: do without.
You missed the point
Posted Mar 27, 2008 10:59 UTC (Thu) by nix (subscriber, #2304) [Link]
Um, ld's *algorithms* are wrong, and the wrongness is deeply embedded. The only way to speed it up as much as gold is is to rewrite it from scratch. Ian did that, and preferred to do it in C++. Feel free to rewrite it yourself, in whatever language you prefer. When you make something faster and easier to maintain than gold, come back to us.
Good point!
Posted Mar 27, 2008 15:31 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
Agreed. If you choose C over C++ merely because C++ is "slow", "bloated", or "inefficient" then don't complain any further until you've rewritten all your applications in assembly language! Then we'll talk about efficient code.
Now, if you choose C over C++ because you're more comfortable, familiar, or experienced at it, then fine, but don't start making unsubstantiated generalizations about how C++ is slow, bloated, inefficient, etc. C++ isn't nearly as bloated or slow as it might have been a number of years ago. And, the Gold linker may improve this even further.
You missed the point, again.
Posted Apr 2, 2008 17:56 UTC (Wed) by sylware (guest, #35259) [Link]
You are missing the point full throttle, reread my posts.
You missed the point, again.
Posted Apr 2, 2008 19:51 UTC (Wed) by nix (subscriber, #2304) [Link]
Ah, the last refuge of the erroneous. Sorry, the onus is on *you*: you're the one making the exaggerated claims.
C++ incompatibility history
Posted Mar 26, 2008 23:03 UTC (Wed) by jreiser (subscriber, #11027) [Link]
The C++ version incompatibility and interoperability nightmare was still very much alive only TWO years ago.C++ had a non-standard ABI on early Linux systems, leading to constant breakage of C++ programs. Upgrading one's system would often lead to many or sometimes even all C++ programs no longer working. This hasn't happened in many, many years, ...
Fedora Core 3 was still leading edge in August 2005. Its then-current software updates had gcc-c++-3.4.4-2 and compat-gcc-c++-8-3.3.4.2 because there were incompatibilities between 3.3.4 and 3.4.4. In September 2005, the newly-issued Fedora Core 4 had gcc-4.0.1-4 which was again incompatible with 3.4.4. Fedora Core 5 was released in March 2006, finally signalling that FC3 truly had ridden into history.
C++ incompatibility history
Posted Mar 27, 2008 0:27 UTC (Thu) by solid_liq (guest, #51147) [Link]
The was a gcc problem. gcc had ABI compatibility problems leading into the 4.0 release with C too, not just C++.Also, Fedora is not the baseline standard.
C++ incompatibility history
Posted Mar 28, 2008 0:12 UTC (Fri) by giraffedata (guest, #1954) [Link]
It's also important to note that the echoes of those historical compatibility problems can be with us for a long time. I try to use C++, but it's a trial, because I have systems that have roots going back to Gcc 2 days. There is no switch on these systems I can flip to recompile every program and library with a current compiler. A C++ program compiled with Gcc 3 will not work with an existing C++ library, and vice versa. So when I write new C++ code, I compile with Gcc 2, and probably always will.I recently learned, painfully, that current 'ld' has a similar compatibility problem -- something to do with throwing an exception across object modules.
It's worth noting that none of these problems exist with C. I.e. the zero-headache alternative for me is to use C.
C++ incompatibility history
Posted Mar 28, 2008 3:49 UTC (Fri) by pflugstad (subscriber, #224) [Link]
Heh - I'm still forced to use a EGCS 1.1 cross compiler for a certain embedded OS I work with. Talk about painful. Even more so: it's running under an _old_ version of cygwin on windows (and if you know much about cygwin, you know the old versions had lots of interesting bugs and multiple versions on the same system don't play nice together, so it ends up trashing the more modern cygwin installs)... sigh... Sorry, just had to whine...
Not quite
Posted Mar 27, 2008 0:44 UTC (Thu) by ncm (guest, #165) [Link]
As useful as I find C++, some of the above is not right.There is no standard ABI for C++. G++ (in different versions) has two in common use, with a third coming soon; MSVC++ has others. (Other compilers tend to copy one or other of Gcc's or MSVC++'s, depending on target.) What is different now is that people have learned to include version numbers in the names of library files and library packages, so one rarely tries to link to a library built with the wrong compiler.
C++ code can be substantially faster than the best macro-obscurified C code, even without fancy template tricks. The reason is, believe it or don't, exceptions. Checking return status codes at each level in C (besides obscuring code logic!) is slower than leaving the stack to be unwound by compiler-generated code in the (unlikely) case of an exception.
Shitty programmers are more likely to code in C++ not because they're drawn to it, particularly, but because C++ is what everybody uses in Windows-land, and that's where most of them come from. That could be taken as a slight on typical Windows development habits, but it's really more a matter of the law of big numbers.
The only valid reason to consider C++ unsuitable for some particular "low-level" application is if the environment it must be linked/loaded into was built with a C compiler, and lacks the minimal support needed for, e.g., exception handling. An example is the Linux kernel. There's no reason Linux couldn't all be compiled and linked with G++ -- modulo some undisciplined use of C++ keywords as identifiers -- and then C++ drivers would be fine. However, it would be unwise to throw an exception in many contexts there.
Finally, the instability introduced in Gcc-4.x has a lot more to do with the optimizer than with changes to C++ or its implementation. That instability affected C programs (including the Linux kernel) as much as C++ programs.
None of these affect the conclusion, of course.
Not quite
Posted Mar 27, 2008 4:51 UTC (Thu) by wahern (subscriber, #37304) [Link]
Your theory about C++ exceptions being more performant than a comparable C pattern doesn't pan out. It's a similar argument the Java folk give: "Java *can* be faster, because you can do code optimization on-the-fly". The extra tooling that C++ must put into function prologs and epilogs--and is mandated by the various ABIs--for stack unwinding, as a practical matter, adds at least as much work, and usually more. There are tables to index into--often from within another function which must be called, and maybe using a pointer dereference. Any one of those can add up to several register comparisons. I dunno how function inlining effects exception tooling, but I imagine the relative losses only increase. For the rare instance where you really need to fine tune a block or routine, both C and C++ suffice. I once shaved 20% runtime by changing a single line--loop to GCC built-in; it was in C but would've applied equally to C++. In reality, C applications will be moderately faster. But in most cases we're comparing apples to oranges because, for instance, many people prefer exceptions. If they improve _your_ ability to engineer better solutions, and don't hinder others, there is no other justification required. I don't understand why people try so hard to prove that some feature "tastes better and is less fattening".
Imaginary losses
Posted Mar 27, 2008 6:16 UTC (Thu) by ncm (guest, #165) [Link]
Can you this identify any of this "extra tooling" in assembly output from the compiler? Or are you just making it up? You can "imagine" all the "relative losses" you like, but that has nothing to do with the facts.What is factual is that the extra code each programmer must insert in C code to return error codes, to check error codes, and to dispatch based on error codes compiles to actual instructions that must be executed on every function return. When errors are reported by exception, instead, none of those instructions are executed unless an error occurs. The difference has been measured as high as 15%. Now, 15% isn't very much in Moore's Law country, but it's not negligible. It's not a reason to choose one language over another, but it puts the lie to made-up claims that C++ code is slower than C.
Imaginary losses
Posted Mar 27, 2008 7:20 UTC (Thu) by alankila (guest, #47141) [Link]
I'm not sure what kind of code has been used to benchmark that, but assuming the C++ compiler has to insert some low-level call such malloc() into the generated code to handle the new operator (or whatever), it will have to detect the return code from malloc just the same as the programmer using the C compiler. In general, I suspect C code doesn't execute error paths a lot. In a malloc example there is practically nothing to do but die if it fails. So you'd expect the C++ and C code to actually perform pretty much the same instructions -- both would do the call, and both would test for error, and in case of no error they move forward to the next user construct. In case of error, the C program would do something the programmer wrote, the C++ would do whatever magic is required to raise exception (hopefully without further memory allocations, of course). After this point, things do diverge a lot, but I think in most cases there are no errors to handle. Therefore, it would seem to me that both should perform identically, unless error returns are a common, expected result, in which case you'd have to write dispatch logic in C to deal with each error type (normally a O(log N) switch-case statement I'd guess) while the C++ compiler would probably generate code to figure out which exception handler should receive the exception. Somehow I do get the feeling that C should win in this comparison. After all, it's testing the bits of one integer, while C++ has to test exception class hierarchives. In light of this, it seems ludicruous to claim that C error handlers cost a lot of code that need to be run all the time, but somehow C++ exceptions are "free".
Imaginary losses
Posted Mar 27, 2008 7:56 UTC (Thu) by njs (subscriber, #40338) [Link]
malloc isn't the example to think of here, because yeah, usually you just abort. And the problem isn't that first if statement, where you detect the error in the first place. The problem is that in well-written C code, practically *every* function call has some sort of error checking wrapped around it, because errors in that function need to detected and propagated back on up the stack. It's the propagating that really hurts, because you have to do it with if statements, and if statements are expensive.Compare C:
error_t foo() { char * blah; error_t e = bar(blah); if (!e) return e; e = baz(); if (!e) { free(blah); return e; } /* ... */ }versus C++:
void foo() { std::string blah = bar(); baz(); ... }One might think that the C++ code has "hidden" if statements; for old C++ compilers, that was true. Modern compilers, though, use Extreme Cleverness to avoid that sort of thing. (If you're curious for details, just g++ -S some simple programs and see.)
Imaginary losses
Posted Mar 27, 2008 20:40 UTC (Thu) by pphaneuf (guest, #23480) [Link]
You get a once per function (and thus, amortized better and better with the longer the function) setup and teardown that registers destructors. If there are no destructors, it simply can be left out. Even a "just crash" approach involves one test and branch per possible failure point.
On modern processors, having branches is expensive, due to mis-predicting them. I suspect that's one of the reasons that profile-driven optimizers can be so good, is that they can figure out which side of a branch is more likely. In the case of error-handling, which branch is more likely would be readily obvious to a human, but is harder to do for a compiler (see the annotations available in GCC, used by the Linux kernel code).
The code size increases with error-handling code, often with "dead spots" that get jumped over when there are no error, which on todays faster and faster machines, means increased instruction cache usage, less locality and so on.
I don't doubt that when they happen, C++ exceptions might be more expensive, but the thing with exception is that they don't happen often, and thus, that's the most interesting case.
Imaginary losses
Posted Mar 27, 2008 20:14 UTC (Thu) by wahern (subscriber, #37304) [Link]
Modern C++ exceptions might be conceptually zero-cost, but it is not less work than comparable C code. The difference is in how the stack is prepared to call the function. There is, evidently, a small fixed cost in every C++ function call which offsets the lack of a test+jump after the call. I admit I was unfamiliar w/ the details of modern exception handling, but I'm glad you forced my hand, because if anything we're cutting through some hyperbole.
Also, the error handling pattern in my C code doesn't duplicate as much code as the straw man examples posted here. I'm perfectly capable of using "goto" to jump to a common error handling block within a function, achieving something similar to the range table method of placing the error handling logic outside of the main execution flow. And I do this most of the time, because it just makes sense, and I get, IMO, better readability than exceptions, because there are fewer syntactic blocks to obscure my code. (I admit, that's highly subjective.)
Here's the example you requested. I used GCC--gcc version 4.0.1 (Apple Inc. build 5465)--with -O2 optimization. To compile: [cc|c++] -S -02 -o ex.S ex.c -DCPLUSPLUS=[0|1]
#if CPLUSPLUS #include <iostream> void noargs(int i) { if (i > 1) throw i; return /* void */; } int main (int argc, char *argv[]) { try { noargs(argc); } catch (int e) { _Exit(e); } return 0; } #else #include <stdio.h> int noargs(int i) { if (i > 1) return i; return 0; } int main(int argc, char *arg[]) { int e; if (0 != (e = noargs(argc))) { _Exit(e); } return 0; } #endif
Simple, straight-forward code. Let us count the number of instructions from main() to our call to noargs(), and from return from the noargs() to leaving main().
C++ output:
.globl _main _main: LFB1481: pushl %ebp LCFI4: movl %esp, %ebp LCFI5: pushl %esi LCFI6: subl $20, %esp LCFI7: movl 8(%ebp), %eax movl %eax, (%esp) LEHB0: call __Z6noargsi LEHE0: addl $20, %esp xorl %eax, %eax popl %esi leave ret
On the "fast-path", we have 12 instructions for C++.
Now, plain C:
globl _main _main: pushl %ebp movl %esp, %ebp subl $24, %esp movl 8(%ebp), %eax movl %eax, (%esp) call _noargs testl %eax, %eax jne L10 leave xorl %eax, %eax ret
And in C, we have... 11 instructions. Well, well! And I'm being charitable, because in fact there are additional instructions for noargs() which increase the disparity: 8 in C, 12 in C++. That makes the total count 19 to 24, but for simplicity's sake, I'm happy to keep things confined to the caller.
Explain to me how this is a poor example. I'm willing to entertain you, and I by no means believe that this little example is conclusive. But, it seems pretty telling to me. I admit, I'm surprised how close they are. Indeed, if anybody suggested to me that C++ exceptions introduced too much of a runtime cost, I'd set them straight. But if they looked me straight in the eye and told me unequivocally that they were faster, I'd show them the door.
Imaginary losses
Posted Mar 27, 2008 20:56 UTC (Thu) by pphaneuf (guest, #23480) [Link]
From my experience, the more common thing is not really try/catch, but letting the exception bubble up. Basically, you just want to clean up and tell your caller something went wrong.
We'll agree that if there is a clean up to do, it's probably equally there in C and in C++, right? The "big saving" in C++ is in the case where you just clean up and bubble up the exception. If a function doesn't have cleaning up to do, it doesn't even go in that function at all!
As they say, the fastest way to do something is to not do it.
Imaginary losses
Posted Mar 27, 2008 21:24 UTC (Thu) by wahern (subscriber, #37304) [Link]
Hmmm, good point. So, if you don't throw from an intermediate function, you compound the savings. Well... I guess I'll just call "uncle" at this point. I personally don't like exceptions, specifically because in my experience letting errors "bubble up" usually means that much error context is lost, and the programmer gets into the habit of not rigorously handling errors (that's why, I guess, I didn't think about that pattern). But, in a discussion like this that's inapplicable.
Imaginary losses
Posted Mar 27, 2008 22:15 UTC (Thu) by pphaneuf (guest, #23480) [Link]
My theory is that you do something about it where you can. If you can't think of something useful to work around the problem, then just let it bubble up, maybe someone who knows better will take care of it, and if not, it'll be the same as an assert.
That's clearly sensible in a lot of cases, because otherwise there would be no such thing as error statuses, they'd just all "handle the errors".
I also quite prefer the default failure mode of a programmer failing to handle an error to be a loud BANG than silently going forward...
Imaginary losses
Posted Mar 27, 2008 21:04 UTC (Thu) by wahern (subscriber, #37304) [Link]
I forgot to test multiple calls in the same try{} block. Indeed, for every additional back-to-back call C needs an additional two instructions (test+jump). So, for moderately long functions, w/ a single try{} block and lots of calls to some small set of functions, I can see C++ being faster. The trick is that you don't want the fixed-costs to exceed the gains, of course. In the above example, C++ pulls ahead at the 4th call to noargs(). It would be an interesting exercise to count the number of function definitions, and functions call in my code, and multiple by the respective differences of C and C++. But, it seems complicated by the treatment of blocks in C++. I can see how in some tests C++ came out 15% ahead, though. In any event, there is indeed a fixed-cost to C++ exceptions. There might not be a prologue, but the epilogue is invariably longer for functions, and, apparently, some blocks.
Not quite
Posted Mar 27, 2008 16:13 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]
Most C++ implementations use range tables for exception handling today, so no extra code is needed in the function prologue or the non-exception epilogue. The possibility of a callee throwing can constrain optimisation of the caller, but so does explicit error checking.
Not quite
Posted Mar 27, 2008 20:35 UTC (Thu) by wahern (subscriber, #37304) [Link]
From my limited research, it seems the constraint is much more in C++, because C++ must preserve stack state (minimally, the mere existence of an activation record), whereas in C a compiler can obliterate any evidence of a function call, no matter whether or how the return value is used. Granted, I'm not aware of what kind of requirements the C++ standard mandates; certainly I'd bet in non-conforming mode a compiler could cheat in this respect. I'd like to hear some analysis on this. Inlining in general, though, is actually important, because in C one of the biggest fixed costs you have to keep in mind is function call. As shown in my example else thread, there's comparatively quite a lot of work to maintain the stack. This is, of course, a big deal in most other languages, too. If you've ever written much PerlXS (and peered behind the macros), at some point it dawns on you how much work is being done to maintain the stack--it's incredible! The fixed costs of maintaining call state in Perl dwarfs most everything else--excepting I/O or process control--including manipulation of dynamically typed objects.
Not quite
Posted Mar 27, 2008 22:28 UTC (Thu) by ncm (guest, #165) [Link]
For the record, nothing about exceptions in the C++ standard or common implementation methods interferes with inlining. In practice, the body of an inlined function is just merged into the body of whatever non-inline function it's expanded in. The only place where exceptions interfere with optimization is in that the state of a function context at a call site must be discoverable by the stack unwinder, so it can know which objects' destructors have to run. In practice this means that calls in short-circuited expressions, e.g. "if (a() && b()) ...", sometimes also set a flag: "if (a() && ((_f=1),b())) ...". This only happens if b() returns an object with a destructor, i.e. rarely.
an "old beard" ?
Posted Mar 27, 2008 1:22 UTC (Thu) by tialaramex (subscriber, #21167) [Link]
I guess I'm an "old beard". It's strange to hear that, maybe I should be more pleased than I am. I have an old edition of Stroustrup's book, unlike K&R it is dusty and lives on the bottom shelf alongside other technical works that proved useless or unreadable. I must say that, as a beginning programmer with some experience of C++ when I bought it, it was disappointing. A triumph of ego and verbosity, even. Well, on the one hand you're right, after literally decades of work C++ has more or less matured into a language that you can use to write software for the real world without incurring significantly more pain that C. The stable ABI in particular took a lot longer to arrive than it had any reason to, and longer than you've really allowed in your description. But that maturity comes with a lot of caveats. It was already arguably too easy to write C that you couldn't understand, thus making it unmaintainable, C++ provides any number of features which make that worse, and nearly every beginner text seems to emphasise these features as benefits. The result is that a new "C++ programmer" is often pumping out a mixture of pseudo-code masquerading as program code and Perl-style unmaintainable gobbledegook. By the time C++ was being invented we already knew that the challenge wasn't adding more expressiveness (though given a time machine maybe I'd add namespaces and possibly a weak "class" concept to C89), but delivering more maintainability. The author of this program has already expressed his doubts about the maintainability of his code. Is that inevitable in a C++ program? No, but the language definitely isn't helping. No-one, so far as I can see, is claiming that C++ actually made it significantly easier to write this linker (except perhaps in the sense that the author prefers C++ and he was writing it) or that its performance benefits are in any way linked to the choice of language. So it's understandable that there's concern that we're going to get ourselves an abandoned and unmaintainable piece of software in the core of the tool chain. Maybe one of the people who feels more strongly than me (and has more spare time, it's 0100 and I'm still working) will implement the same approach in C and eliminate the perceived problem.
an "old beard" ?
Posted Mar 27, 2008 2:33 UTC (Thu) by felixfix (subscriber, #242) [Link]
The problem many of us commenters have with sylware's comment is that it is ludicrous to expect a modern system to rely only upon C and avoid anything fancier. I commented on that above -- but I want to say a little more in response to this. I avoid C++ like the plague for my own purposes, for, I think, pretty much the same reasons -- it is far more complex for not much gain. It's been a long time since I did anything with C++, so maybe these comments will be rusty too, but two bad memories come back. One is malloc/free in C, as horrible as they are and easy to abuse, turning into three pairs of calls (new/delete and new array/delete array) -- mix them and have an instant mess -- not just in the two new ways, but in the additional traps of mixing them up. How can that be considered progress? The other bad memory was needing to specify virtual if you wanted to override methods -- what is the point of object oriented if overriding wasn't the default? The entire language struck me as more of a rushed standard to beat other OO versions of C, and then one pile of rushed patches to the spec after another. Nevertheless, some people get along better with C++ than others do with C. Forcing everyone to write in C will simply result in more bad matchups between personalities, projects, and tools. Old beards like UNIX because it is a system of tools which allow the users and developers to pick the right combination for them and the job. If forcing everybody to use C were the answer, it wouldn't be UNIX, it would merely be Microsoft "my way or the highway" but on the C highway instead of the C++ highway.
C++ features and performance
Posted Mar 27, 2008 5:34 UTC (Thu) by zlynx (guest, #2285) [Link]
Two of the problems you mention are caused because C++ is all about performance. Any language feature that will slow things down requires the programmer to explicitly use it. Array new is a separate function because it has to store the number of elements in the memory allocation. Making new and array new the same would waste a size_t for every alloc. Virtual has to be specified because it slows down the function calls. I have personal recent experience with virtual. I used virtual functions in an interface base class. After I got done profiling, I did explicit calls to class functions instead of virtuals in the derived classes and one use of dynamic_cast and then using the pointers with a template function. The code was over 20 times faster. The real killer wasn't the indirect jump, it was how virtuals block inlining and most compiler optimization, since it can't know what function will really be called.
Making up falsehoods
Posted Mar 27, 2008 7:01 UTC (Thu) by ncm (guest, #165) [Link]
What the commenters above are expressing is simply fear of learning. (Read a modern C++ program, and you won't find any calls to "new" or "delete", array or otherwise. "Virtual" isn't the default because it's only rarely the right thing. It's been years since I typed the keyword "virtual" in my own C++ code.) If you don't chafe at the lack of expressiveness in C, Java, or C#, it can only be because you're not trying to express much of anything. Ian chose C++ not just because he "prefers" it (whatever that means). He chose it because it's demonstrably better at expressing complex solutions to complex problems. For an easy problem, any old language will do. Most problems are easy, and most programmers spend their lives solving one easy problem after another. They can use whatever toy language they first encountered, and never learn another thing. People drawn to hard problems want the sharpest tool they can get. Right now C++98 is that tool. (C++09 will be a sharper tool.) Nothing else even comes close, and nothing on the horizon looks like it will. That's too bad, because a language equally powerful but a tenth as complex ought to be possible, but none seems to be forthcoming yet.
Making up falsehoods
Posted Mar 28, 2008 2:34 UTC (Fri) by wahern (subscriber, #37304) [Link]
You don't chafe at the lack of lexical scoping? Nested methods? You don't pine for proper tail recursion?
People complain that C++ doesn't have threads built in, and the standard retort is invariably that the next standard will provide built in mutexes and other sugar. But that's not expressive. If you want expressive concurrency, checkout Limbo's channels or Erlang's messages. Just because you can approximate something doesn't mean you've captured its expressiveness.
And how is a language where the majority of the operators and modifiers are English keywords, wrapped in an endless series of punctuation marks, expressive? Reading overly wrought C++ code can be like reading a court reporter's short-hand, except you can never be sure what the proper translation is from one compilation unit to the next--certainly not from one project to the next. And if you keep it clean, you're not exercising all those expressive features.
Combine GCC extensions like the typeof() operator and statement expressions, and/or some M4 pre-processing, and you end up with code only nominally less readable than template definitions, yet just as type-safe.
The first step to solving a complex problem is to first reduce it to a set of simple problems. You then choose tools which best solve the simple problems. (Of course, most problems are really simple, so its sensible to use any one general purpose language.) I see these gargantuan C++ projects, and I think to myself C++ is more of a plague than anything else. Some people tout KIO as the greatest thing since sliced bread; but I'm sitting on the sidelines, thinking I wish my non-KDE applications could benefit from that. Some "solution", that feat of super-charged object-orientation, walled up behind a language that says "my way or the highway". That kind of expressiveness is light years behind fopen()--usable in C, C++, Perl, Awk, Java, C#, Lua, TCL, Python, Haskell, Ada, and nearly any other language one could find on their system.
People claim that C++ is "multi-paradigmatic". Oddly, it fails to provide the most useful and expressive alternative paradigms out there. And with all the nice shiny features, even all the tried-and-true paradigms--like process modularization--are too often left out in the cold. If you've got the world's greatest set of hammers, everything looks like a nail.
If you like C++, great. It is... a language. Just like any other, except a little more practical than most, and much more widely used (due in large part to near universality in the Microsoft community). I don't use C++, because I routinely use more than a half-dozen other languages--and the one that binds them all: C.
Making up falsehoods
Posted Mar 29, 2008 7:56 UTC (Sat) by ncm (guest, #165) [Link]
It's no crime not to like any particular language. (Lord knows I loathe my share of them.) When you have to invent one falsehood after another to justify your dislike, though, it makes you look silly, or dishonest. I know of plenty to dislike in C++; they tend to be inconveniences in features entirely lacking in other languages. But, for the record... Lexical scoping, we have. Tail recursion? Pass; over twenty-five years using C and C++, every time I have been tempted to leave a tail recursion in my code, changing it to explicit looping has turned out to make the code clearer and more maintainable. (Certainly there are languages that do or would benefit from it, but C and C++ seem not to need it.) Nested "methods"? Coming, more or less, in C++09. Also coming are mutexes and atomics, but more important is that they can be built into whatever higher-level apparatus you prefer. The great lesson of C was that anything you can build into a library, you are better off without in the core language. (For C, particularly, that was I/O. People miss how revolutionary that was thought, then.) C++09 adds enormous power for library writers, to enable easier-to-use libraries, which will in turn make the language markedly more competitive against scripting languages.
Making up falsehoods
Posted Apr 3, 2008 18:05 UTC (Thu) by jchrist (guest, #14782) [Link]
I'm genuinely curious. If "modern C++" programs don't use new/delete or virtual, how do you dynamically allocate/deallocate memory? What do you use instead of virtual functions?
C++ new and delete
Posted Apr 7, 2008 0:49 UTC (Mon) by pr1268 (subscriber, #24648) [Link]
how do you dynamically allocate/deallocate memory?
A big push for "modern" C++ programming is to use the standard library's container classes, e.g., vector, list, deque, etc. instead of arrays created with new and delete.
The primary rationale for using dynamically-allocated memory in the first place is to defer until execution time reserving only as much space as is needed (since memory is a scarce resource). The C++ containers have a rich set of methods which allow array-like operation while managing the dynamic allocation (and subsequent releasing) of resources at the library level (instead of making the programmer do it him/herself with new and delete).
I can't speak for why virtual methods are seldom described in literature, but my intuitive perception is that they're this philosophical concept in computer science (object-oriented programming in particular) whose existence programmers are expected to know but avoid using unless no other practical solution exists--kind of like recursion.
I see I have to be more explicit
Posted Mar 29, 2008 7:21 UTC (Sat) by felixfix (subscriber, #242) [Link]
You all seem to be concentrating on the limbs of the trees I mentioned, ignoring the entire forest. I know perfectly well why C++ has all those warts, speed. But languages progress by more than merely adding hundreds of new steam gauges (to use a quaint hardware engineers' term) to an engine to allow ever finer control of it. Anyone who is familiar with radial engines, those wonderful noisy big round engines on WW II planes, will understand my comparison of C as the R2800 and C++ as the Wright 3350 (hope I got that "right") which was the complex culmination of radial engine technology. What was needed was not such a finicky beast but a new way of thinking, the jet engine, especially augmented with a glass cockpit which only shows you those instrument readings you need to know, such as only engine temps which are out of spec. C++ is horrible because it added complexity without benefit. It has all the hallmarks of kluge piled on top of kluge, patched and held together by duct tape. 3 times as many malloc call pairs and the additional trap of mixing them up? I doubt C++ gained triple the benefit, and to say that the compiler, even while knowing the types, could not choose the right calls based on type, beggars the imagination. I would be embarrassed to rely on such a sorry excuse for an excuse. If virtual should only be used when absolutely necessary, then it was designed wrongly. That is what I was complaining about, not the rationale for each additional complication. All these complicated additional features screamed loud and clear that the basic design was flawed beyond redemption; it's too bad no one was brave enough to say the emperor's clothes were darned to nothingness. That is why C++ disgusts me and I avoid it like the plague.
I see I have to be more explicit
Posted Mar 31, 2008 2:51 UTC (Mon) by ncm (guest, #165) [Link]
Ignorance is a poor basis for criticism. If you don't understand the purpose for any given feature, you will be ill-equipped to evaluate the benefits available from using it. By all means avoid C++ if you like. Most programmers are not intelligent enough to use the language effectively, and should stick to solving easy problems with trivial languages, and leave the hard work to professionals.
an "old beard" ?
Posted Mar 27, 2008 10:17 UTC (Thu) by jschrod (subscriber, #1646) [Link]
> By the time C++ was being invented we already knew that the challenge > wasn't adding more expressiveness (though given a time machine maybe I'd > add namespaces and possibly a weak "class" concept to C89), but delivering > more maintainability. The author of this program has already expressed his > doubts about the maintainability of his code. If you read the article, this was not because he used C++. He was reluctant if other people will understand his finite state automaton, that needs to be understood overall to see what the code does. FSAs tend to distribute functionality to small pieces of code where the connection is often not easily visible. Many programmers can't handle that properly.
an "old beard" ?
Posted Mar 27, 2008 12:10 UTC (Thu) by tialaramex (subscriber, #21167) [Link]
I did read the article, and I agree that C++ doesn't inherently make his state transition stuff harder to understand. But it also doesn't help. That's all. I guess there are languages which would, but I don't know if they're also suitable for the low-down bit mangling the linker does elsewhere.
finite state machines
Posted Mar 29, 2008 1:33 UTC (Sat) by man_ls (guest, #15091) [Link]
Not that I know of. Finite state machines are actually hard to code and read in any language, so your argument (C++ somehow made gold more difficult) sounds like a red herring to me.Oddly enough, it seems this is an area where graphical programming should help: state diagrams (or flowcharts) can really help you understand a state machine. But apart from some of the new BPM tooling, which covers a similar but different problem space, that idea hasn't flown either.
finite state machines
Posted Mar 29, 2008 7:30 UTC (Sat) by salimma (subscriber, #34460) [Link]
FSMs are trivial in languages with tail-call optimization (Lisp et. al., ML, Haskell .. even Lua!). It is true, though, that most of these languages are not geared towards low-level bit manipulation. C--, perhaps. It's a C-like language used by the Haskell team as their intermediate language, a sort of souped-up assembler.
finite state machines
Posted Mar 29, 2008 12:39 UTC (Sat) by man_ls (guest, #15091) [Link]
Oops, you are right: FSMs are indeed easier to implement in those languages. My big mouth again.They are still pretty hard to follow and understand. Which is of course the concern of the author of gold. But maybe even this can be alleviated with functional programming; on the 7th Antual ICFP Programming Contest functional languages were shown to be an order of magnitude better at writing input for FSMs than imperative languages. I'm not so sure any longer.
So thanks for the clarification, and please disregard my earlier uninformed comment about a red herring.
No-one?
Posted Mar 27, 2008 10:18 UTC (Thu) by khim (subscriber, #9252) [Link]
<p><i>No-one, so far as I can see, is claiming that C++ actually made it significantly easier to write this linker.</i></p> <p>Huh? What is this, <a href="http://www.airs.com/blog/archives/56">then</a>? <i>The new linker I am working on, gold, is written in C++. One of the attractions was to use template specialization to do efficient byte swapping.</i> Of course it's not the only place where gold is using C++ to speed things up, but look like you are complaining about C++ usage without looking on code, without any research, "just in principle"... Not very constructive at all...</p>
No-one?
Posted Mar 27, 2008 11:46 UTC (Thu) by tialaramex (subscriber, #21167) [Link]
Actually that appears to be the author arguing that his program runs faster in part because it avoids byte swapping on the native architecture, a trivial optimisation which he's managed to manufacture into some C++ templates. In practice, looking at the sample code, templates seem to deliver about the same maintainability as endian-swap macros, which is what an equivalent C program (not the existing GNU ld) would use. I don't see a claim that it was actually /easier/ than macros, do you? Just that this optimisation is worth having, despite the complexity compared to GNU ld. Keep in mind I used to write my own programs in C++, and I stopped, so I'm not arguing from ignorance or out of some misguided attempt to avoid learning something new. I made an explicit choice to stop, and I think my choice was justified. At the time a lot of people told me it was a mistake, but some of them have since stopped too. But I don't want to waste my whole week arguing with language fanatics of any stripe. Code is king, and anyway the main consumer for a faster linker is huge C++ programs, and this faster linker is written in C++, so that should mean the people who care are already in a position to look after it. Good luck, and if gold becomes the default please don't break the trivial cases needed for linking my C programs.
C++ problem
Posted Mar 27, 2008 16:16 UTC (Thu) by ikm (subscriber, #493) [Link]
In its essence, C++ just takes all the common concepts pretty much any complex C program uses and makes it part of a language itself, offloading the associated implementation costs from the shoulders of the programmer to the compiler itself. C people do the same C++ inheritance (in form of one structure having another one as the first member, and sometimes even having a macro to do upcasts, like e.g. offsetof macro in linux kernel), have the same member functions (which are usual functions that take the associated structure pointer as the first parameter), virtual functions (in form of the structures having pointers to functions, which in essence is making vtables by hand), same constructor and destructor functions which are to be carefully called every time the associated structure is created or deleted; instead of namespaces they put long common prefixes. If we drop here other C++'s concepts, namely templates and exceptions/rtti, we see that C++ is just automating things that were to be done same way manually, saving people from doing lengthy and error-prone routine they'd have to go into in bare C. The end result is supposed to be exactly the same, since the compiler pretty much does exactly what a human would do, it just does that very pragmatically and systematically. I have to admit that when I first saw linux kernel sources, I was amazed to see very strict C++ code carefully and duly implemented in bare C. From there, it was quite easy to feel the actual problem C++ has. It is not performance at all, or compatibility problems, or whatever else people usually like to attribute here it's just the complexity and the associated incorrect uses and/or abuses of power, resulting in bloat, inefficiencies and incorrect runtime behavior. To program in C++, one have to actually understand all the background work the compiler does, e.g., calling constructors/destructors in the right places and in the right order, building vtables and using them, passsing 'this' pointer to member functions, adjusting pointers when doing casts and so on. Templates and STL further complicate that by emitting large amounts of code at compile time and making heavy use of dynamic memory at runtime, with no apparent notice to the unwary programmer. In essence, C++ has some cost other than just an increased compile time. Thing is, this cost is not technical. If linux kernel were to be implemented in C++, it would have had much less developers, or would otherwise be turned into dysfunctional bloatware in no time. I myself would never start a project in bare C if I wouldn't have to. I even program for microcontrollers in C++ if there is a capable compiler available (i.e. a gcc port). But I understand that this approach does not always work for all people. It's simply just too complex, and therefore harder to do right many people would seem to sacrifice other resources to make it simpler, and in case of C that would be doing everything manually, by hand.
C++????
Posted Mar 26, 2008 19:05 UTC (Wed) by and (guest, #2883) [Link]
> gold is C++... C++ for a core toolchain program?? Well... -->trash. He who codes decides. Come up with something better in C if you can!
Optimization
Posted Mar 26, 2008 19:16 UTC (Wed) by paragw (guest, #45306) [Link]
What about optimizations - will the new linker be capable of doing all the optimizations that GNU LD does?
Optimization
Posted Mar 27, 2008 0:14 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]
Which optimizations do you have in mind? It already handles the basic ones, like COMDAT.
Optimization
Posted Mar 27, 2008 1:05 UTC (Thu) by paragw (guest, #45306) [Link]
I wasn't sure about what optimizations GNU LD does - only that there was a documented -O switch to tell it to optimize which warns that it takes a lot of time to optimize. So I assumed it must be doing significant optimizations. Got to check out on the possible linker optimizations - interesting exercise. (Someone from Sun claimed gld does not do half the optimizations that the Sun Linker does - may be specific to SPARC.)
Optimization
Posted Mar 27, 2008 17:18 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]
GNU ld's -O switch only affects shared library generation; the optimization is described in this LWN article. I don't know whether gold does anything similar.
GCC Incremental Compiler
Posted Mar 26, 2008 19:18 UTC (Wed) by mjw (subscriber, #16740) [Link]
Another interesting development around the GNU Toolchain/GCC is http://gcc.gnu.org/wiki/IncrementalCompiler which would nicely combine with the concurrent linker work. Some more background can be found in this interview: http://spindazzle.org/greenblog/index.php?/archives/74-In... And Tom Tromey keeps a blog about it: http://tromey.com/blog/?cat=15
The famous "./configure; make"
Posted Mar 27, 2008 0:34 UTC (Thu) by stuart_hc (guest, #9737) [Link]
the famous "./configure; make" command line
This should more correctly be ./configure && make
in order to prevent an existing Makefile from surprising you.
Stuart
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 0:50 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
# ./configure && make && make install && (cd /usr/local/bin && find . -type f | xargs file | grep "not stripped" | cut -d':' -f1 | xargs strip --strip-unneeded) && (cd /usr/local/lib && find . -type f | xargs file | grep "not stripped" | cut -d':' -f1 | xargs strip --strip-unneeded) && ldconfig && sync
Anyone want to guess which Linux distro I use?
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 1:31 UTC (Thu) by stevenj (guest, #421) [Link]
Why not justmake install-strip
?
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 1:50 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
Not all source packages have install-strip. But, yes, I usually do look for that target after the ./configure script is finished.
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 15:47 UTC (Thu) by rvfh (guest, #31018) [Link]
one where people compile as root... and I wouldn't recommend it.
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 18:15 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
What's the harm in compiling as source code as root? I've heard of some urban myth where compiling the Linux kernel as root used to throw a segmentation fault, but that was fixed ages ago. Granted, my example above is not compiling kernel source, but rather some user space application.
FWIW I run that chain of commands in a Konsole window, logged into KDE as a non-privileged user, and a su - in the terminal. I'm unsure of any (obvious) security issues in this configuration as I'm sitting right there at the physical computer.
And, just the other day I attempted to compile kernel 2.6.24.4 as non-privileged user and got some weird errors during the first-running scripts (i.e. after make menuconfig). Sure, I'll admit I might be doing it "wrong" (or in a fashion you don't recommend), but how might I do it "right"? Thanks!
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 20:17 UTC (Thu) by zlynx (guest, #2285) [Link]
Compiling as root has some hazards because of how complicated configure and make scripts can be. It isn't that there might be a hidden trojan in the code (after all you're going to run make install anyway right), but that the build scripts might accidentally do bad things. For example, I once saw a makefile try to delete and rebuild some system library because it was listed as a dependency that got make's automatic target rules all excited. It only tried that on certain configurations of machine, never the developer's so he never saw the problem.
The famous "./configure; make" - my custom version
Posted Mar 27, 2008 21:04 UTC (Thu) by felixrabe (guest, #50514) [Link]
Well, you probably don't rungrep 'rm -rf' configure
every time before you compile something, right? ...
The famous "./configure; make" - my custom version
Posted Mar 28, 2008 0:02 UTC (Fri) by pr1268 (subscriber, #24648) [Link]
Well, you probably don't run grep 'rm -rf' configure every time before you compile something, right? ...
I can't say I've ever done that. Nor have I experienced anything remotely similar to zlynx's scenarios.
Although I do understand the risks of overusing the root account, I only build sources I consider trustworthy (usually tarball projects downloaded from SourceForge or KDE), but even these reputable source repositories may let some malicious code slip by...
I also admit I'm "going against the grain" in the context of accepted make/build practices; I'll try to figure out how to adjust permissions and/or ownership such that I can do ./configure and make as an unprivileged user...
The famous "./configure; make" - my custom version
Posted Mar 28, 2008 22:22 UTC (Fri) by nix (subscriber, #2304) [Link]
chown your build tree to the unprivileged user, build as normal, `make install' as root or (if you're paranoid) use `fakeroot' or some similar package to install and then deal with the resulting miniature root tree using GNU stow or something similar. (Rarely (very rarely) one encounters packages that write stuff to the build tree during the `make install' process, so you might need to be root to *remove* it if you chose to do the final installation as root.)
The famous "./configure; make" - my custom version
Posted Mar 28, 2008 13:47 UTC (Fri) by DonDiego (guest, #24141) [Link]
The code needs not be intentionally malicious. Just imagine that a Makefile contains a line like rm -rf $(VARIABLE)/path/to/somewhere Now if $(VARIABLE) happens to be empty (perhaps only in your nonstandard configuration and not on the developer's machine), pray that there is nothing important below /path/to/somewhere ... That's just a simple example, it's easy to come up with more. It's not so much about protection against malice, but protection against accidents. Accidents do happen, it's a fact of life. If you want to drive without a seatbelt, all I can wish you is good luck...
Striking gold in binutils
Posted Mar 29, 2008 10:52 UTC (Sat) by stock (guest, #5849) [Link]
hehe what a gas, I have been told by the real old diehard UNIX men that gcc was a total shit of banana republic code, as its C++ part was of utter inferior quality. With the arrival of Ian's gold linker gcc as a C/C++ package has finally moved up in the ranks. And it was about time also, as the impact of open source and the obliged use of gcc made many people puke. Now instead of naming it gold, integrate it properly inside the gcc "compilersuite" and make it use by default when compiling c++ code. But that's where the problem is, as many source trees are hybrid code of .c and .cpp files. As to linker problem when compiling Linux vmlinux kernels, i like to suggest that Torvalds et.al. could solve this inside their "Makefile" of the Linux kernel source tree. Some time ago i wanted to checkout how a vmlinuz binary actually was created by doing every command by hand. It took me more as two days to distillate the commands by hand from the various Kbuild and related patch works and custom utils. Robert -- Robert M. Stockmann - RHCE Network Engineer - UNIX/Linux Specialist crashrecovery.org stock@stokkie.net
Striking gold in binutils
Posted Mar 29, 2008 13:44 UTC (Sat) by nix (subscriber, #2304) [Link]
Your comment belongs on slashdot, not here. Let's see how many falsehoods I can eliminate before ncm gets here and vapourises you. Firstly, your 'old diehard UNIX men' are clueless. GCC relied on a lot of ad-hoc algorithms at one point: that did not make it a 'total shit of banana republic code' by any standard. G++ was one of the first C++ compilers written, I think the first that did not emit C code: so it certainly wasn't of 'utter inferior quality' then. GCC as a whole went through a bad patch in the early-to-mid-90s, when maintenance largely stalled: but once egcs started (has it been ten years already?) G++ was one of the first parts to improve. It's definitely not 'utterly inferior' now, in fact it's one of the better compilers out there, with an internal architecture that's gone from being crufty as anything and full of hidden dependencies to being, well, much less crufty over the last few years (cleaning up code that old is an achievement). Its only remaining hole is optimization: especially on machines with small register files, vendor compilers could often out-optimize it. This, too, is steadily being fixed. Introducing a new linker, no matter how hotshot (and gold *is* a damn good piece of work) doesn't make the *compiler* better or worse in any way. I can't fathom the misunderstandings that could lead you to believe that switching linkers depending on the source language in use has any merit at all. The most integration that is likely to happen seems to be to have the linker running in parallel with the compiler, accepting object code and incrementally linking it as the compiler disgorges it. Anything more than that seems pointless. Makefiles are the wrong place to solve linker script problems. The problem is that GNU ld was fundamentally driven by linker scripts, and gold is not (as well as that the Linux kernel constructs some seriously contorted ELF files: OS kernels don't have to stick to the rules: they are generally loaded by special-purpose code, not by ELF loaders). The obvious way to see what commands are executed when running make is to use 'make -n', but if some of the commands are themselves makes you'll need to do something else. It's probably simplest just to add a couple of printf()s to make itself, or strace the whole thing with "-ff -e trace=process". Digging the commands out of the makefile by hand is madness.
Striking gold in binutils
Posted Apr 3, 2008 19:03 UTC (Thu) by stock (guest, #5849) [Link]
" Digging the commands out of the makefile by hand is madness." You bet, but IF you would do such a thing, you would understand that on a 64-bit platform we are actually booting a 32bit binary : ld -m elf_i386 -Ttext 0x100000 |\ -e startup_32 -m elf_i386 |\ arch/x86_64/boot/compressed/head.o |\ arch/x86_64/boot/compressed/misc.o |\ arch/x86_64/boot/compressed/piggy.o |\ -o arch/x86_64/boot/compressed/vmlinux Apparently we are actually booting a 32bit binary on our x86_64 platform. But that doesn't matter as long as we call our real kernel piggy.o : ld -m elf_i386 -r --format binary --oformat elf32-i386 |\ -T arch/x86_64/boot/compressed/vmlinux.scr |\ arch/x86_64/boot/compressed/vmlinux.bin.gz |\ -o arch/x86_64/boot/compressed/piggy.o where vmlinux.bin.gz is created by the following command : gzip -f -9 < arch/x86_64/boot/compressed/vmlinux.bin > |\ arch/x86_64/boot/compressed/vmlinux.bin.gz where vmlinux.bin is created by running a custom tool called objcopy : objcopy -O binary -R .note -R .comment -S vmlinux |\ arch/x86_64/boot/compressed/vmlinux.bin where finally vmlinux is that ELF 64-bit LSB executable : ld -m elf_x86_64 -e stext -T arch/x86_64/kernel/vmlinux.lds.s |\ arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o |\ arch/x86_64/kernel/init_task.o init/built-in.o |\ --start-group usr/built-in.o arch/x86_64/kernel/built-in.o |\ arch/x86_64/mm/built-in.o arch/x86_64/ia32/built-in.o |\ kernel/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o |\ security/built-in.o crypto/built-in.o lib/lib.a |\ arch/x86_64/lib/lib.a lib/built-in.o arch/x86_64/lib/built-in.o |\ drivers/built-in.o sound/built-in.o arch/x86_64/pci/built-in.o |\ net/built-in.o --end-group .tmp_kallsyms2.o -o vmlinux -rwxr-xr-x 1 root root 7478601 Nov 21 14:48 vmlinux vmlinux: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), statically linked, not stripped Cheers, Robert -- Robert M. Stockmann - RHCE Network Engineer - UNIX/Linux Specialist crashrecovery.org stock@stokkie.net
Striking gold in binutils
Posted Apr 3, 2008 20:31 UTC (Thu) by nix (subscriber, #2304) [Link]
IF you would do such a thing, you would understand that on a 64-bit platform we are actually booting a 32bit binaryActually, the only things that differ between the elf_x86-64 and elf_i386 emulations are details of the GOT and PLT (which don't exist for the kernel normally), details of relocation processing (which isn't carried out for the kernel), and the start address (which the kernel overrides anyway in its custom linker script). So since they do the same thing as far as the kernel's concerned, let's pick the one that everyone's got because it's present even in older x86 binutils and always present in newer ones, and that's elf_i386.
(At least that's my understanding.)
where vmlinux.bin is created by running a custom tool called objcopy :Not only is objcopy not a 'custom' tool, but part of the GNU binutils, but it predates Linux (although before about 1993 it was called 'copy'). (The renaming was done by, oh look, Ian Lance Taylor. He's been involved in this area for a long, long time. ;} )
(The objcopy run is anyway hardly crucial: like most of the stuff before the final link, it's just a size optimization.)
Striking gold in binutils
Posted Apr 2, 2008 6:51 UTC (Wed) by ncm (guest, #165) [Link]
Oh, and consider yourself vaporized.
Striking gold in binutils
Posted Apr 2, 2008 8:34 UTC (Wed) by nix (subscriber, #2304) [Link]
I am not worthy. ;}
A ToC of the 20 part linker essay
Posted Apr 7, 2008 6:28 UTC (Mon) by JesseW (subscriber, #41816) [Link]
Since I couldn't find any well-linked ToC of Ian's 20-part essay on linkers either on his blog, or here, I decided to post one. (And yes, I know the post URLs are consecutive numbers; nevertheless...)
I compiled the titles mainly from Ian's section titles, as Ian just referred to the parts by number.
And now, the author of gold, Ian Lance Taylor's 20 part Linker posts...
- Introduction, personal history, first half of what's-a-linker
- What's-a-linker: Dynamic linking, linker data types, linker operation
- Address spaces, Object file formats
- Shared Libraries
- More Shared Libraries -- specifically, linker implementation; ELF Symbols
- Relocations, Position Dependent Shared Libraries
- Thread Local Storage (TLS) optimization
- ELF Segments and Sections
- Symbol Versions, Relaxation optimization,
- Parallel linking
- Archive format
- Symbol resolution
- Symbol resolution from the user's point of view; Static Linking vs. Dynamic Linking
- Link time optimization, aka Whole Program optimization; Initialization Code
- COMDAT sections
- C++ Template Instantiation, Exception Frames
- Warning Symbols,
- Incremental Linking
- __start and __stop Symbols, Byte Swapping
- Last post; Update on gold's status
I release this message (the ToC and comments) into the public domain, no right reserved. Use it, copy it, perform it, create derivative works with no restrictions and without any further permission from me.
A ToC of the 20 part linker essay
Posted Apr 7, 2008 14:37 UTC (Mon) by nix (subscriber, #2304) [Link]
I do like the idea of performing it. A public reading of a table of contents! I bet it'll be popular ;)
A ToC of the 20 part linker essay
Posted Sep 23, 2013 19:00 UTC (Mon) by cataliniacob (guest, #91150) [Link]
Here's a Calibre recipe that creates an e-book from the whole series:
https://github.com/cataliniacob/calibre-recipes/blob/mast...
To use it run ebook-convert ian-taylor-linker.recipe linkers.epub
Thanks to JesseW and ncm for the tips to read it.
A ToC of the 20 part linker essay
Posted Sep 26, 2013 17:03 UTC (Thu) by nix (subscriber, #2304) [Link]