Simpler syslets

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jonathan Corbet
December 10, 2007

Syslets are a proposed mechanism which would allow any system call to be invoked in an asynchronous manner; this technique promises a more comprehensive and simpler asynchronous I/O mechanism and much more - once all of the pesky little details can be worked out. A while back, Zach Brown let it be known that he had taken over the ongoing development of the syslets patch set; things have been relatively quiet since then. But Zach has just returned with a new syslets patch which shows where this idea is going.

This version of the patch removes much of the functionality seen in previous postings. The ability to load simple programs into the kernel for asynchronous execution is now gone, as is the "threadlet" mechanism for asynchronous execution of user-space functions. Instead, syslets have gone back to their roots: a mechanism for running a single system call without blocking.

As had been foreshadowed in other discussions, syslets now use the indirect() system call mechanism. An application wanting to perform an asynchronous system call fills in a syslet_args structure describing how the asynchronous execution is to be handled; the application then calls indirect() to make it happen. If the system call can run without blocking, indirect() simply returns with the final status. If blocking is required, the kernel will (as with previous versions of this patch) return to user space in a separate process while the original process waits for things to complete. Upon completion, the final status is stored in user-space memory and the application is notified in an interesting way.

The syslet_args structure looks like this:

    struct syslet_args {
	u64 completion_ring_ptr;
	u64 caller_data;
	struct syslet_frame frame;
    };

The completion_ring_pointer field contains a pointer to a circular buffer stored in user space. The head of the buffer is defined this way:

    struct syslet_ring {
	u32 kernel_head;
	u32 user_tail;
	u32 elements;
	u32 wait_group;
	struct syslet_completion comp[0];
    };

Here, kernel_head is the index of the next completion ring entry to be filled in by the kernel, and user_tail is the next entry to be consumed by the application. If the two are equal, the ring is empty. The elements field says how many entries can be stored in the ring; it must be a power of two. The kernel uses wait_group as a way of locating a wait queue internally when the application waits on syslet completion; your editor suspects that this part of the API may not survive into the final version.

Finally, the completion status values themselves live in the array of syslet_completion structures, which look like this:

    struct syslet_completion {
	u64 status;
	u64 caller_data;
    };

When a syslet completes, the final return code is stored in status, while the caller_data field is set with the value provided in the field by the same name in the syslet_args structure when the syslet was first started.

There is one field of syslet_args which has not been discussed yet: frame. The definition of this structure is architecture-dependent; for the x86 architecture it is:

    struct syslet_frame {
	u64 ip;
	u64 sp;
    };

These values are used when the syslet completes. After the kernel stores the completion status in the ring buffer, it will call the function whose address is stored in ip, using the stack pointer found in sp. This call serves as a sort of instant, asynchronous notification to the application that the syslet is done. It's worth noting that this call is performed in the original process - the one in which the syslet was executed - rather than in the new process used to return to user space when the syslet blocked. This function also has nothing to return to, so, after doing its job, it should simply exit.

So, to review, here is how a user-space application will use syslets to call a system call asynchronously:

The completion ring is established and initialized in user space.
A stack is allocated for the notification function, and the syslet_args structure is filled in with the relevant information.
A call is made to indirect() to get the syslet going.
If the system call of interest is able to complete without blocking, the return value is passed directly back to user space from indirect() and the call is complete.
Otherwise, once the system call blocks, execution switches to a new process which returns to user space. An ESYSLETPENDING error is returned in this case.
Once the system call completes, the kernel stores the return value in the completion ring and calls the notification function in the original process.

Should the application wish to stop and wait for any outstanding syslets to complete, it can make use of a new system call:

    int syslet_ring_wait(struct syslet_ring *ring, unsigned long user_idx);

Here, ring is the pointer to the completion ring, and user_idx is the value of the user_tail index as seen by the process. Providing the tail as an argument to syslet_ring_wait() prevents problems with race conditions which might come about if a syslet completes after the application has decided to wait. This call will return once there is at least one completion in the ring.

The real purpose of this set of patches is to try to nail down the user-space API for syslets; it is clear that there is still some work to be done. For example, there is no way, currently, for an application to use indirect() to simultaneously launch a syslet and (as was the original purpose for indirect()) provide additional arguments to the target system call. In fact, the means for determining which of the two is being done looks dangerously brittle. As Zach has already noted, the calling convention needs to be changed to make the syslet functionality and the addition of arguments orthogonal.

There are a number of other questions which need to be answered - Zach has supplied a few of them with the patch. Interaction with ptrace() is unclear, resource management issues abound, and so on. Zach is clearly looking for feedback on these issues:

I'm particularly interested in hearing from people who are trying to use syslets in their applications. This will involve awkward wrappers instead of glibc calls for now, and your machine may explode, but hopefully the chance to influence the design of syslets would make it worth the effort.

So, the message is clear: anybody who is interested in how this interface will look would be well advised to pay attention to it now.

Index entries for this article
Kernel	Syslets

(Log in to post comments)

Simpler syslets

Posted Dec 10, 2007 23:01 UTC (Mon) by josh (subscriber, #17465) [Link]

This new architecture seems to remove one of the key advantages of the previous syslet
interfaces: the ability to run many system calls without incurring system call overhead for
each one.

Simpler syslets

Posted Dec 10, 2007 23:10 UTC (Mon) by nix (subscriber, #2304) [Link]

Also, because the completion function is called in the original 
process --- well, what do the old and new tasks share? If they share as 
little as POSIX processes, the completion function is going to have to 
resort to something nasty like a pipe read by the other process just to 
communicate with itself.

Simpler syslets

Posted Dec 10, 2007 23:31 UTC (Mon) by i3839 (guest, #31386) [Link]

I think that's what the ugly syslet_ring_wait() is meant for.

I see a lot of unnecessary complexity, where should I drop my feedback, lkml or lwn? I'll mail
it to lkml tomorrow...

Simpler syslets

Posted Dec 11, 2007 6:39 UTC (Tue) by butlerm (subscriber, #13312) [Link]

In the current patch, the parent and child share virtually everything (VM, FS, files, etc) - i.e. the child task is essentially a thread, not an independent POSIX process. I can hardly imagine why anyone would want to do anything else.

Also, the more sophisticated functionality is not being abandoned, it is just being put off in the name of initial simplicity. The announcement has the details.

Simpler syslets

Posted Dec 11, 2007 0:13 UTC (Tue) by jamesh (guest, #1159) [Link]

So invoking a syslet may or may not result in the program continuing as a new process.  What
happens to child processes in this case?  If they stay attached to the syslet process, the
main program won't be able to check their exit status.

No ordinary process

Posted Dec 11, 2007 6:44 UTC (Tue) by butlerm (subscriber, #13312) [Link]

A Linux "process" is a rather more generic concept than a POSIX process. Here we are talking about something more like a thread. See above.

Simpler syslets

Posted Dec 11, 2007 1:47 UTC (Tue) by seanMcGrath (guest, #1563) [Link]

typo:
"The completion ring is established an initialized in user space."  
"an" = "and".

Is this not the lisp "future" function?

Posted Dec 13, 2007 13:07 UTC (Thu) by davecb (subscriber, #1574) [Link]

At least one experimental lisp had a "future"
function, which took a complete function call
and returned a function to call at some convenient
later time to get the results of the function.
In pseudo-c

void *later = future(function_to_run, (args_to_pass));

Future would start the function running 
asynchronously with code to catch its
results.

The caller to future would eventually call "later"
and if the results were already there, would return
with them.  If not, it would block until they
were available.

I found this elegant, and note that it separates
indirection and threadlet-ing.

--dave

Is this not the lisp "future" function?

Posted Dec 21, 2007 15:35 UTC (Fri) by ringerc (subscriber, #3071) [Link]

That's rather similar to how Java-style threading is done in the case where the "master"
thread and the child task share no data. Presumably that's always the case in lisp, given its
functional design.

Sure, the Java approach is uglier and more verbose, but the principle remains practically the
same. You can use a callback or (with the TrollTech Qt approach) event in the event loop to
detect completion. Or you can just poll for completion by testing an instance variable of the
thread subclass.

http://java.sun.com/j2se/1.3/docs/api/java/lang/Thread.html

A similar approach can be used in C++ with Qt. It's really rather nice, and makes threading
quite sane for launching independent deferred calls that should produce a result "later".

Another amazing idea

Posted Dec 20, 2007 10:29 UTC (Thu) by jfj (guest, #37917) [Link]

Handle blocking by continuing to run the userspace in a different process!

These guys are full of bright ideas. First cgcc, now this.
And then they say OSS has no innovation. We are talking about Creators, Inventors and
Innovators (TM) here. Not just some people stealing ideas.