Tornado and Grand Central Dispatch: a quick look
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
Two traditionally proprietary companies made open source releases recently: Facebook released a Python-based web server and application framework called Tornado, and Apple released a thread-pool management system called Grand Central Dispatch. It is not the first open source code release for either company, but both projects are worth examining. Tornado is designed to suit specific types of web applications and is reportedly very fast, while Grand Central Dispatch may cause some developers to re-think task-parallelism.
This Tornado serves you
Tornado is actually a product of FriendFeed, the social-networking-aggregator acquired by Facebook in August. It consists of a web server and surrounding framework (all written in Python), tailored to handle a very large number of established, open connections. The web server component (tornado.web) is "non blocking" — meaning that it is event-driven, designed around the Linux kernel's epoll facility, and can thus maintain large numbers of open TCP sockets without tying up excessive memory and without large numbers of threads.
Event-driven Web servers like Tornado are single-threaded; each thread can manage potentially thousands of open connections as long as the application does not block while it waits for data from the socket — the thread simply polls them each in turn. Additional connections can be handled by running multiple server processes on SMP systems. In contrast, traditional web servers are blocked from handling additional connections while they wait for I/O, or must spawn additional threads to handle additional connections at the cost of context-switching and increased memory use.
In addition to the web server itself, the Tornado release includes a suite of modules used to build web applications, including XHTML, JSON, and URL decoding, a MySQL database wrapper, a localization and translation module, a Python templating engine, an HTTP client, and an authentication engine. The latter supports third-party schemes such as OAuth and OpenID, plus site-specific schemes used by Facebook, Yahoo, and Twitter.
The Tornado code is hosted on GitHub and is available under the Apache 2.0 license. Tornado works with Python 2.5 and 2.6, and requires PycURL and a working JSON library. Documentation is available on tornadoweb.org, and a live demo "chat" application is running on http://chan.friendfeed.com:8888/.
FriendFeed's Bret Taylor announced the release on his blog, comparing Tornado to web.py and Google webapp. He claims that in Apache Benchmark tests, Tornado was able to handle four times the number of requests per second (or more) of competing frameworks, including web.py, Django, and CherryPy.
Taylor's post, and the subsequent discussion, sparked some controversy among users and developers of the Twisted framework, who objected to disparaging comments about Twisted's code maturity and suitability. Twisted founder Glyph Lefkowitz posted a lengthy response responding to the claims made about Twisted, but, overall, approving of the Tornado release itself. Matt Heitzenroder posted his own head-to-head performance tests that show Tornado beating Twisted.web, but not dramatically.
Aside from performance numbers, many in the open source community seemed impressed by what Tornado offers — a simple framework for building "long polling" web applications, including support for everything from templating to cookie management to localization in a single package. Since Tornado has proven itself viable as the framework underlying FriendFeed, it is likely to pick up a significant following as an open source project.
Invisible threads
Apple's Grand Central Dispatch (GCD) is an operating system-level feature that debuted in the recent release of OS X 10.6 ("Snow Leopard"). GCD is essentially a mechanism to allow application developers to parallelize their code, but let the OS worry about intelligently managing the threads. GCD determines the maximum number of concurrent threads for the system and manages the queues for all running applications. Thus the application developer only needs to write GCD-capable code, and trusts the OS to take optimal advantage of multiple cores and multiple processors.
Apple's source code release consists of the Apache-licensed user space API library libdispatch and changes to the XNU kernel, Apple's open source Mach-based kernel common to OS X and Darwin. The XNU changes reportedly improve performance of the event notification interface Kqueue. GCD also relies on a non-standard extension to C, C++, and Objective-C known as "blocks," however, so blocks support in the compiler is a prerequisite for application developers wishing to take advantage of GCD. Blocks are supported for the LLVM compiler through the compiler-rt project.
Because GCD abstracts thread creation from the application developer, it is most similar to OpenMP or Intel's Threading Building Blocks (TBB). All three allow the developer to designate portions of code as "tasks" to be parallelized in some fashion. GCD is different in that it leverages a language feature (blocks) rather than the preprocessor directives of OpenMP or templates of TBB. In addition, TBB is limited to C++, though OpenMP is available for C, C++, and Fortran.
Blocks are essentially inline-defined, anonymous functions. They are
designated by a caret (^) in place of a function name, take arguments like
any function, and can optionally return a value. Blocks are different in
that they have read-only access to variables from their parent scope (a
feature similar to "closures" in languages such as Ruby). Consequently, in
replacing a for
loop with GCD's parallel equivalent,
dispatch_apply
, the developer can write a block containing the
loop's contents without the hassle of passing extra arguments to it just to
access variables that were available to the loop.
From Apple's Concurrency Programming Guide, the following example loop iterates count times:
for (i = 0; i < count; i++) { printf("%u\n",i); }which could be expressed as a block ready for GCD as follows:
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_apply(count, queue, ^(size_t i) { printf("%u\n",i); });
When executed, GCD creates count tasks, one for each iteration
of the block, placing them on a task queue. GCD makes a default queue
available through dispatch_get_global_queue()
, but developers
can create private queues if they wish; to serialize access to a
shared data structure, for example. In the traditional parallelizing-a-for-loop
example, tasks are queued asynchronously, but GCD provides several
mechanisms for monitoring completion of tasks, such as callbacks and
semaphores.
Apple provides a basic introduction to GCD and programming with blocks on its developers' site. In addition, the OS X scientific research community at MacResearch.org has a detailed tutorial complete with GCD examples and the equivalent code written for OpenMP. MacResearch.org has basic performance numbers posted for its tutorial code, and Apple has posted a benchmarking sample that compares GCD against serialized code and native POSIX threads.
So far, GCD is only implemented for Mac OS X, but reaction from the developer community has been positive. Having the operating system worry about the details of thread pool management seems like a winning idea; most of the discussion on Mac forums has revolved around the wisdom of relying on a language extension such as blocks. Ars Technica commented on places where Linux could benefit from a native GCD implementation, such as in higher-level frameworks like QtConcurrent, but notes that use of the Apache license limits integration to projects using GPL version 3 and later.
Impact
Apple and Facebook have a history of making periodic releases of code projects under open source terms, even though both enjoy a reputation for maintaining "walled gardens" around their core products. As is predictable when large proprietary companies release open source code, considerable energy has been expended on the web speculating as to what each company hoped to "gain" from the release. A leading theory for GCD is that Apple hopes to further the adoption of blocks into standard C and C++, but no consensus has yet emerged for why Tornado was released.
In fact, neither Tornado nor GCD has made major waves in the open source community, but if the initial reaction is a good indicator, both are solid and valuable products. GCD is the likelier of the two to stir up passionate debate going forward, as fully assimilating it into mainstream Linux would require touching not one but two of the fundamental pillars of the community: the kernel and the compiler. Although LLVM has its fans, the Linux community is still predominantly a GCC ecosystem. Pushing Apple code into the Linux kernel and into GCC won't happen lightly.
Index entries for this article | |
---|---|
GuestArticles | Willis, Nathan |
(Log in to post comments)
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 18:25 UTC (Wed) by ctwise (guest, #10952) [Link]
are-not-gcd.html
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 19:27 UTC (Wed) by mgedmin (subscriber, #34497) [Link]
Your link is not working; here's what it should be: GCD doesn't require blocks.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 19, 2009 17:48 UTC (Sat) by cma (guest, #49905) [Link]
Does GCD has something similar to syslets/threadlets ? http://lwn.net/Articles/221913/
Tornado and Grand Central Dispatch: a quick look
Posted Sep 24, 2009 13:27 UTC (Thu) by cowsandmilk (guest, #55475) [Link]
Showing an example where it uses blocks is silly.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 18:33 UTC (Wed) by wahern (subscriber, #37304) [Link]
It's just a small step, then, to allow full read-write access without requiring the programmer to qualify the storage type. The compiler could detect expressions--and does, I'm sure--which modify the object, and automagically change it's storage type. But they decided not to do this, I think, because they wanted the restrictions on the object to be explicit (and documented in code), and not as a side-effect of how the Block happens to handle the object.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 19:30 UTC (Wed) by mgedmin (subscriber, #34497) [Link]
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 20:37 UTC (Wed) by wahern (subscriber, #37304) [Link]
int i;
int *p = &i;
(void)(^block)(void) = ^{ *p = 0; }
I'm fairly certain (but you can read the documentation yourself), that such code is fine. I think you might be confusing (or rather, not distinguishing):
int *const p;
with
const int *p;
and/or
const int *const p;
The first is effectively what you get from the perspective of the Block's scope. Note also that, by default, the semantics of the Block's "closure" are copy-by-value, not by-reference. This is the reason for the read-only restriction. So that:
int i = 42;
(void)(^block)(void) = ^{ printf("%d", i); }
i = 7
printf("%d", i); // yields 7
block(); // yields 42
You'd get the expected result--both printing 7--with:
__block int i = 42;
But this is all explained, more-or-less, in Apple's documentation. What I'm still unclear of is how the run-time handles the collection of __block-storage objects. It uses some sort of reference counting on the Block objects, and the __block-storage objects are referenced by the Block, presumably. But how it manages to decrement the reference count, I'm not sure of. It might not do it at all, so that in order for a Block to remain valid outside of the scope it was defined in you have to use the API, i.e. Block_copy, etc.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 22:05 UTC (Wed) by wahern (subscriber, #37304) [Link]
int i;
int *p = &i;
(void)(^block)(void) = ^{ int *const local_p = p; *local_p = 0; };
// is the equivalent of ^{ *p = 0; }
Finally, I'm fairly certain now that in order to return a Block from a function you have to manually use Block_copy() and Block_release(). So, there's no magic garbage collection going on under the hood, though there is some magic in the optimization of moving a Block from auto storage to dynamic storage--thus the __block qualifier and rules.
Implication being, that if you are passed a Block object and aren't going to use it exclusively from the current scope (e.g. the code flow falls back into an event loop, storing the Block for later use), you need to manage it with Block_copy() and Block_release(), similar to malloc()/free(). And I'm sure libdispatch/GCD does exactly this when passed a Block.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 8:54 UTC (Thu) by mgedmin (subscriber, #34497) [Link]
Tornado and Grand Central Dispatch: a quick look
Posted Sep 18, 2009 10:39 UTC (Fri) by madcoder (guest, #30027) [Link]
What I'm still unclear of is how the run-time handles the collection of __block-storage objects. It uses some sort of reference counting on the Block objects, and the __block-storage objects are referenced by the Block, presumably. But how it manages to decrement the reference count, I'm not sure of. It might not do it at all, so that in order for a Block to remain valid outside of the scope it was defined in you have to use the API, i.e. Block_copy, etc.Well, there is a Block_release() function for that. And actually, wrt __block variables, it stores a reference (pointer) to the variable on stack until you do your first Block_copy, then (if that happens) it moves the __block variable to the heap, alongside the copied block. IOW the __block variable address may change between two runs of the same block, that's why __block is incompatible with static.
Blocks in C, not C++
Posted Sep 16, 2009 19:03 UTC (Wed) by ncm (guest, #165) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 19:15 UTC (Wed) by ncm (guest, #165) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 19:25 UTC (Wed) by cry_regarder (subscriber, #50545) [Link]
Cry
Blocks in C, not C++
Posted Sep 16, 2009 21:36 UTC (Wed) by elanthis (guest, #6227) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 22:39 UTC (Wed) by ncm (guest, #165) [Link]
Blocks in C, not C++
Posted Sep 17, 2009 17:17 UTC (Thu) by cry_regarder (subscriber, #50545) [Link]
\begin{pedantic}
NOT "C++0xA". The "x" is a wild card which under normal expectations would be replaced with an element of [0-9]. The joke is that since "C++0x" is taking so long. We (they) will keep the implied promise of a first decade delivery by extending the membership of the wildcard to [0-F] (hex).
That means that if you replace the "x" with an "A", you write it as "C++0A" not as "C++0xA"!.
In any case (as pointed out elsewhere), the name is not the real name for the language. It is a joke. The "official" non-real name will surely be of the form "C++1x".
\end{pedantic}
Cry
next C++ standard
Posted Sep 17, 2009 19:17 UTC (Thu) by man_ls (guest, #15091) [Link]
I think that the joke is funnier if you interpret the "0x" as the hexadecimal prefix in C and, well, C++; so "0xA" is "10" in legal C. Therefore decimal "C++09" is followed by hexadecimal "C++0xA".
next C++ standard
Posted Sep 18, 2009 11:41 UTC (Fri) by liljencrantz (guest, #28458) [Link]
next C++ standard
Posted Sep 27, 2009 22:23 UTC (Sun) by bluss (subscriber, #47454) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 20:45 UTC (Wed) by atai (subscriber, #10977) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 20:51 UTC (Wed) by nteon (subscriber, #53899) [Link]
blocks support built-in I believe. Clang as well. To actually _use_ blocks,
you need the BlocksRuntime shared library, available as part of compiler-rt [1]
(which builds and runs on Linux as of last week). Unfortunately I don't think
its packaged anywhere yet.
Blocks in C, not C++
Posted Sep 16, 2009 20:54 UTC (Wed) by drag (guest, #31333) [Link]
Clang/LLVM does not support anything but C. For C++ you'd need GCC proper or GCC/LLVM. I beleive.
Blocks in C, not C++
Posted Sep 17, 2009 2:12 UTC (Thu) by jdahlin (subscriber, #14990) [Link]
GCC, but enough to compile a couple of small test programs. I'm confident
clang will deliver a C++ compiler reasonably compatible with GCC sometime
next year.
Blocks in C, not C++
Posted Sep 16, 2009 21:54 UTC (Wed) by ncm (guest, #165) [Link]
Blocks in C, not C++
Posted Sep 16, 2009 22:46 UTC (Wed) by ncm (guest, #165) [Link]
Blocks in C, not C++
Posted Sep 17, 2009 1:29 UTC (Thu) by busterb (subscriber, #560) [Link]
X's most-supported language.
Blocks in C, not C++
Posted Sep 18, 2009 8:01 UTC (Fri) by marcH (subscriber, #57642) [Link]
Here is a list of non-standard GCC extensions:
http://www.ibm.com/developerworks/linux/library/l-gcc-hacks/
You can probably find a better list; this one took me only 15 seconds to find.
This Tornado
Posted Sep 16, 2009 19:21 UTC (Wed) by ncm (guest, #165) [Link]
The title appears to be a reference to the Neko Case song from the phenomenally excellent album Middle Cyclone, "This Tornado Loves You", some songs from which she performs again at http://opbmusic.org/performances/102-Neko-Case
This Tornado
Posted Sep 16, 2009 23:02 UTC (Wed) by leoc (guest, #39773) [Link]
You can never have too much Neko.
This Tornado
Posted Sep 17, 2009 15:45 UTC (Thu) by n8willis (subscriber, #43041) [Link]
Nate
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 21:08 UTC (Wed) by nteon (subscriber, #53899) [Link]
see any 'Apple code' looking for inclusion in the Linux kernel.
It might also be worth noting that some enterprising fellows have GCD running
on FreeBSD already. For the Linux 'port' there is talk of using libevent
(which abstracts kqueue and epoll).
I've been interested in working on this, but am somewhat put off by the fact
that it is Apache licensed (knowing that no matter how well it works, license
restrictions will prevent people from playing with it). Are there guides or
advice on how to start a similar project that would be API compatible but
legally under the GPL? Would this even be possible if I've looked through
the source for GCD (which I have)?
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 21:44 UTC (Wed) by MathFox (guest, #6104) [Link]
Are there guides or advice on how to start a similar project that would be API compatible but legally under the GPL? Would this even be possible if I've looked through the source for GCD (which I have)?In general, one can independently re-implement a public interface without any obligation to the original designer of the interface. The keyword is independently and that's why in professional re-implementation settings one uses a clean-room methodology. (See http://en.wikipedia.org/wiki/Clean_room_design).
If you have read the source code it's better that you stay on the specification side of things and avoid implementing... OTOH, I think that Apple in this case will only object to very obvious code lifting, as they Open Sourced the code already. (IIRC under a GPLv3 compatible license)
Tornado and Grand Central Dispatch: a quick look
Posted Oct 6, 2009 9:56 UTC (Tue) by trasz (guest, #45786) [Link]
is a system library, there is nothing preventing its use by programs under any kind of license,
including GPL.
Tornado and Grand Central Dispatch: a quick look
Posted Oct 6, 2009 14:11 UTC (Tue) by nix (subscriber, #2304) [Link]
Linux distributors, which means it'll remain unused by Linux distributions
(because what's the point of a system library that you can't link to GPLed
code?).
It should just have been unencumbered BSD, dammit.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 16, 2009 21:54 UTC (Wed) by ballombe (subscriber, #9523) [Link]
If an error occurs inside a library you link to in an OpenMP thread, you generally have no way of recovering.
Is GCD better in that regard ?
(By contrast the POSIX thread API let you do a pthread_exit() with a marker value that will notify the caller of the error and also provide a way to cancel the remaining threads)
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 0:43 UTC (Thu) by njs (guest, #40338) [Link]
I had the strong impression that OpenMP was a language feature comparable to blocks. It uses #pragma's for its syntax (which is a little awkward), but #pragma isn't interpreted by the preprocessor, it's just an escape hatch for asking the compiler to invoke some language extensions without actually changing its parser.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 3:50 UTC (Thu) by zooko (guest, #2589) [Link]
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 19:42 UTC (Thu) by zooko (guest, #2589) [Link]
rather than a tad slower. http://www.apparatusproject.org/blog/2009/09/twisted-web-vs-
tornado-part-deux/
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 11:41 UTC (Thu) by zmower (subscriber, #3005) [Link]
So there's another entry in the "solve the multiprocessor problem" but it only runs on MacOS and has a GPL incompatible license. Pfft.Maybe the free software world might be better off looking towards a mature cross-platform solution that's already part of gcc? It's called the Ada runtime system (overview here). From Ada, creating a number of concurrent tasks is as easy as declaring an array of task types. Protected types (implicit locking) addresses the concurrent access problem. There are rendezvous for synchronous inter-task communication and protected types with guards for asynchronous inter-task communication.
Add in simple interfacing to C, exception handling, strong typing, OO programming, packages, a standard component library, streams and a sane generics system. Why would you use anything else? ;-)
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 13:13 UTC (Thu) by rsidd (subscriber, #2582) [Link]
If the majority of free software licences are GPL-incompatible, the problem is with the GPL, not with all the other licences.
As for "only runs on MacOS:" I predict that won't be true very long. And I believe that's why Apple released the source.
Tornado and Grand Central Dispatch: a quick look
Posted Sep 17, 2009 15:45 UTC (Thu) by n8willis (subscriber, #43041) [Link]
Nate