Interview: the return of the realtime preemption tree

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet
February 16, 2009

The realtime preemption project is a longstanding effort to provide deterministic response times in a general-purpose kernel. Much code resulting from this work has been merged into the mainline kernel over the last few years, and a number of vendors are shipping commercial products based upon it. But, for the last year or so, progress toward getting the rest of the realtime work into the mainline has slowed.

On February 11, realtime developers Thomas Gleixner and Ingo Molnar resurfaced with the announcement of a new realtime preemption tree and a newly reinvigorated development effort. Your editor asked them if they would be willing to answer a few questions about this work; their response went well beyond the call of duty. Read on for a detailed look at where the realtime preemption tree stands and what's likely to happen in the near future.

LWN: The 2.6.29-rc4-rt1 announcement notes that you're coming off a 1.5-year sabbatical. Why did you step away from the RT patches so long; have you been hanging out on the beach in the mean time? :)

Thomas: We spent a marvelous time at the x86 lagoon, a place with an extreme contrast of antiquities and modern art. :)

Seriously, we underestimated the amount of work which was necessary to bring the unified x86 architecture into shape. Nothing to complain about; it definitely was and still is a worthwhile effort and I would not hesitate longer than a fraction of a second to do it again.

Ingo: Yeah, hanging out on the beach for almost two years was well-deserved for both of us. We met Linus there and it was all fun and laughter, with free beach cocktails, pretty sunsets and camp fires. [ All paid for by the nice folks from Microsoft btw., - those guys sure know how to please a Linux kernel hacker! ;-) ]

So what has brought you back to the realtime work at this time?

Thomas: Boredom and nostalgia :) In fact I never lost track of the real time work since we took over x86 maintenance, but my effort was restricted to decode hard to solve problems and make sure that the patches were kept in a usable shape. Right now I have the feeling that we need to put more development effort into preempt-rt again to keep its upstream visibility and make progress on merging the remaining parts.

The most important reason for returning was of course our editors challenge in The Grumpy Editor's guide to 2009: "The realtime patch set will be mostly merged by the end of the year..."

Ingo: When we left for the x86 land more than 1.5 years ago, the -rt patch-queue was a huge pile of patches that changed hundreds of critical kernel files and introduced/touched ten thousand new lines of code. Fast-forward 1.5 years and the -rt patchqueue is a humungous pile of patches that changes nearly a thousand critical kernel files and introduces/touches twenty-thirty thousand lines of code. So we thought that while the project is growing nicely, it is useful and obviously people love it - the direction of growth was a bit off and that this particular area needs some help.

Initially it started as a thought experiment of ours: how much time and effort would it take to port the most stable -rt patch (.26-rt15) to the .29-tip tree and could we get it to boot? Turns out we are very poor at thought experiments (just like we are pretty bad at keeping patch queues small), so we had to go and settle the argument via some hands-on hacking. Porting the queue was serious fun, it even booted after a few dozen fixes, and the result was the .29-rt1 release.

Maintaining the x86 tree for such a long time and doing many difficult conceptual modernizations in that area was also very helpful when porting the -rt patch-queue to latest mainline.

Most of the code it touched and most of the conflicts that came up looked strangely familiar to us, as if those upstream changes went through our trees =;-)

(It's certainly nothing compared to the beach experience though, so we are still looking at returning for a few months to a Hawaii cruise.)

How well does the realtime code work at this point? What do you think are the largest remaining issues to be tackled?

Thomas: The realtime code has reached quite a stable state. The 2.6.24/26 based versions can definitely be considered production ready. I spent a lot of time to sort out a huge amount of details in those versions to make them production stable. Still we need to refactor a lot of the patches and look for mainline acceptable solutions for some of the real time related changes.

Ingo: To me what settled quite a bit of "do we need -rt in mainline" questions were the spin-mutex enhancements it got. Prior that there were a handful of pretty pathologic workload scenarios where -rt performance tanked over mainline. With that it's all pretty comparable.

The patch splitup and patch quality has improved too, and the queue we ported actually builds and boots at just about every bisection point, so it's pretty usable. A fair deal of patches fell out of the .26 queue because they went upstream meanwhile: tracing patches, scheduler patches, dyntick/hrtimer patches, etc.

It all looks a lot less scary now than it looked 1.5 years ago - albeit the total size is still considerable, so there's definitely still a ton of work with it.

What are your current thoughts with regard to merging this work into the mainline?

Thomas: First of all we want to integrate the -rt patches into our -tip git repository which makes it easier to keep -rt in sync with the ongoing mainline development. The next steps are to gradually refactor the patches either by rewriting or preferably by pulling in the work which was done in Stevens git-rt tree, split out parts which are ready and merge them upstream step by step.

Ingo: IMO the key thought here is to move the -rt tree 'ahead of the upstream development curve' again, and to make it the frontier of Linux R&D. With a 2.6.26 basis that was arguably hard to do. With a partly-2.6.30 basis (which the -tip tree really is) it's a lot more ahead of the curve, and there are a lot more opportunities to merge -rt bits into upstream bits wherever there's accidental upstream activity that we could hang -rt related cleanups and changes onto. We jumped almost 4 full kernel releases, that moves -rt across 1 year worth of upstream development - and keeps it at that leading edge.

Another factor is that most of the top -rt contributors are also -tip contributors so there's strong synergy.

The -tip tree also undergoes serious automated stabilization and productization efforts, so it's a good basis for development _and_ for practical daily use. For example there were no build failures reported against .29-rt1, and most of the other failures that were reported were non-fatal as well and were quickly fixed. One of the main things we learned in the past 1.5 years was how to keep a tree stable against a wild, dangerous looking flux of modifications.

YMMV ;-)

Thomas once told me about a scheme to patch rtmutex locks into/out of the kernel at boot time, allowing distributors to ship a single kernel which can run in either realtime or "normal" mode. Is that still something that you're working on?

Thomas: We are not working on that right now, but it is still on the list of things which need to be investigated.

Ingo: That still sounds like an interesting feature, but it's pretty hard to pull it off. We used to have something rather close to that, a few years ago: a runtime switch that turned the rtmutex code back into spinning code. It was fragile and hard to maintain and eventually we dropped it.

Ideally it should be done not at boot time but runtime - via the stop-machine-run mechanism or so. [extended perhaps with hibernation bits that force each task into hitting user-mode, so that all locks in the system are released]

It's really hard to implement it, and it is definitely not for the faint hearted.

The RT-preempt code would appear to be one of the biggest exceptions to the "upstream first" rule, which urges code to be merged into the mainline before being shipped to customers. How has that worked out in this case? Are there times when it is good to keep shipping code out of the mainline for such a long time?

Thomas: It is an exception which was only acceptable because preempt-rt does not introduce new user space APIs. It just changes the run time behaviour of the kernel to a deterministic mode.

All changes which are user space API related (e.g. PI futexes) were merged into mainline before they got shipped to customers via preempt-rt and all bug fixes and improvements of mainline code were sent upstream immediately. Preempt-rt was never a detached project which did not care about mainline.

When we started preempt-rt there was huge demand on the customer side - both enterprise and embedded - for an in kernel realtime solution. The dual kernel approaches of RTAI, RT-Linux and Xenomai had no chance to get ever accepted into the mainline and the handling of the dual kernel environment has never been an easy task. With preempt-rt you just switch the kernel under a stock mainline user space environment and voila your application behaves as you would expect - most of the time :) Dual kernel environments require different libraries, different APIs and you can not run the same binary on a non -rt enabled kernel. Debugging preempt-rt based real time applications is exactly the same as debugging non real time applications.

[PULL QUOTE: While we never had doubts that it would be possible to turn Linux into a real time OS, it was clear from the very beginning that it would be a long way until the last bits and pieces got merged. END QUOTE] While we never had doubts that it would be possible to turn Linux into a real time OS, it was clear from the very beginning that it would be a long way until the last bits and pieces got merged. The first question Ingo asked me when I contacted him in the early days of preempt-rt was: "Are you sure that you want to touch every part of the kernel while working on preempt-rt?". This question was absolutely legitimate; in the first days of preempt-rt we really touched every part of the kernel due to problems which were mostly locking and preemption related. The fixes have been merged upstream and especially in the locking area we got a huge improvement in mainline due to lock debugging, conversion to mutexes, etc. and a general better awareness of locking and preemption semantics.

preempt-rt was always a great breeding ground for fundamental changes in the kernel and so far quite a large part of the preempt-rt development has been integrated into the mainline: PI-futexes, high-resolution timers ... I hope we can keep that up and provide soon more interesting technological changes which emerged originally from the preempt-rt efforts.

Ingo: Preempt-rt turns the kernel's scheduling, lock handling and interrupt handling code upside down, so there was no realistic way to merge it all upstream without having had some actual field feedback. It is also unique in that you need _all_ those changes to have the new kernel behavior - there's no real gradual approach to the -rt concept itself. That adds up to a bit of a catch-22: you don't get it upstream without field use, and you don't get field use without it being upstream.

Deterministic execution is a major niche, one of which was not effectively covered by the mainstream kernel before. It's perhaps the last major technological niches in existence that the stock upstream kernel does not handle yet, and it's no wonder that the last one out is in that precise position for conceptually hard reasons.

In short: all the easy technologies are upstream already ;-)

Nevertheless we strictly got all user-ABI changes upstream first: PI-futexes in particular. The rest of -rt is "just" a new kernel option that magically turns kernel execution into deterministic mode.

Where would be the best starting point for a developer who wishes to contribute to this effort?

Thomas: Nothing special with the realtime patches. Just kernel development as usual: get the patches, apply them, run them on your machine and test. If problems arise, provide bug reports or try to fix them yourself and send patches. Read through the code and start providing improvements, cleanups ...

Ingo: Beyond the "try it yourself, follow the discussions, and go wherever your heart tells you to go" suggestion, there's a few random areas that might need more attention:

Big Kernel Lock removal. It's critical for -rt. We still have the tip:core/kill-the-BKL branch, and if someone is interested it would be nice to drive that effort forward. A lot of nice help-zap-the-BKL patches went upstream recently (such as the device-open patches), so we are in a pretty good position to try the kill-the-BKL final hammer approach too.
[I have just done a (raw!) refresh and conflict resolution merge of that tree to v2.6.29-rc5. Interested people can find it at:
```
      git pull \
        git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git \
        core/kill-the-BKL
```
Warning: it might not even build. ]
Look at Steve's git-rt tree and split out and gradually merge bits. A fair deal of stuff has been cleaned up there and it would be nice to preserve that work.
Latency measurements and tooling. Go try the latency tracer, the function graph tracer and ftrace in general. Try to find delays in apps caused by the kernel (or caused by the app itself), and think about whether the kernel's built-in tools could be improved.
Try Thomas's cyclictest utility and try to trace and improve those worst-case latencies. A nice target would be to push the worst-case latencies on a contemporary PC below 10 microseconds. We were down to about 13 microseconds with a hack that threaded the timer IRQ with .29-rt1, so it's possible to go below 10 microseconds i think.
And of course: just try to improve the mainline kernel - that will improve the -rt kernel too, by definition :-)

But as usual, follow your own path. Independent, critical thinking is a lot more valuable than follow-the-crowd behavior. [As long as it ends up producing patches (not flamewars) that is ;-)]

And by all means, start small and seek feedback on lkml early and often. Being a good and useful kernel developer is not an attribute but a process, and good processes always need time, many gradual steps and a feedback loop to thrive.

Many thanks to Thomas and Ingo for taking the time to answer (in detail!) this long list of questions.

Index entries for this article
Kernel	Preemption
Kernel	Realtime

(Log in to post comments)

at last!

Posted Feb 16, 2009 21:27 UTC (Mon) by nettings (subscriber, #429) [Link]

wow. this is major news for the linux audio crowd. i took the liberty of posting a subscriber's link to a public mailing list, i hope that's ok. thanks for this coverage!

at last!

Posted Feb 17, 2009 5:04 UTC (Tue) by quotemstr (subscriber, #45331) [Link]

Erm, I would have asked before sending out that link.

subscriber links

Posted Feb 17, 2009 5:14 UTC (Tue) by garrison (subscriber, #39220) [Link]

From the LWN FAQ:

Where is it appropriate to post a subscriber link?

Almost anywhere. Private mail, messages to project mailing lists, and weblog entries are all appropriate. As long as people do not use subscriber links as a way to defeat our attempts to gain subscribers, we are happy to see them shared.

Interview: the return of the realtime preemption tree

Posted Feb 17, 2009 11:42 UTC (Tue) by mtk77 (guest, #6040) [Link]

Interesting article. But I don't quite follow "All paid for by the nice folks from Microsoft btw". Is Ingo working there now?

Interview: the return of the realtime preemption tree

Posted Feb 17, 2009 12:34 UTC (Tue) by dambacher (subscriber, #1710) [Link]

http://en.wikipedia.org/wiki/Irony
/dambacher

Interview: the return of the realtime preemption tree

Posted Feb 17, 2009 13:37 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

He is working for Red Hat for a long time now. He is obviously just joking.

Interview: the return of the realtime preemption tree

Posted Feb 17, 2009 14:25 UTC (Tue) by nix (subscriber, #2304) [Link]

That's just what you think. In fact Ingo is an emissary of the Dark Side,
but this goes unknown as any attempt to discuss it on public fora leads to
argh argh aaarrrghhh...

Interview: the return of the realtime preemption tree

Posted Feb 17, 2009 16:26 UTC (Tue) by drag (guest, #31333) [Link]

Whatever.

He is just throwing a hint out to Microsoft. Basically he is saying:

"Hey Microsoft, if you want stop Linux domination all you have to do is keep all the Linux developers fat, happy, drunk, rich, and hand out beach homes like candy"

Interview: the return of the realtime preemption tree

Posted Feb 18, 2009 1:11 UTC (Wed) by SEJeff (guest, #51588) [Link]

Good point, then instead of being paid to work on Linux, a lot of them
would just do it for fun... Oh wait, a lot of them ALREADY work on Linux
for fun. Hmmmm Microsoft is in a heap of trouble then.

Whatever?

Posted Feb 18, 2009 19:54 UTC (Wed) by man_ls (guest, #15091) [Link]

What do you mean, "whatever"? So despite all his good work (and his personal friendship with Linus), Ingo is just a minion of Microsoft? This is completely outrageous! We should discuss this issue because argh argh aaarrrghhh...

Another Hard real time Linux

Posted Feb 18, 2009 21:51 UTC (Wed) by razb (guest, #43424) [Link]

Hello
I have written a piece of software called offline scheduler(offsched). It is based on Linux ability to offload a running processor. what i do is very simple:
1. offload a processor.
2. let this processor wonder in my hook.

currently, i have written a 1-us timer. I will happy for any criticism.
It is my master work.

http://sos-linux.svn.sourceforge.net/viewvc/sos-linux/off...

Raz

Another Hard real time Linux

Posted Feb 19, 2009 6:44 UTC (Thu) by i3839 (guest, #31386) [Link]

If I understood your idea correctly, you basically just run some very limited kernel code on a dedicated core with all unrelated interrupts etc. disabled?

That seems so limited that it has not much practical use. Biggest problems are that it can't run user code and that any communication with other cores easily breaks the real-time guarantuee.

Examples:

- 1us accurate timer.
Standard gettimeofday gives me that. The system has plenty very accurate timers, the problem is transferring that info fast enough to where it's needed.

- Firewall/routing/etc. offloading.
This is totally real-time unrelated. Basically it wastes one whole core on doing that instead of letting that core do also other things, and adds extra communication overhead between cores/subsystems (still need to get the packets from somewhere and tell which ones go where etc). It seems the same can be achieved by pinning the NIC interrupts to one core and giving all network related stuff highest priority.

You basically replace standard processes with very limited kernel code running on dedicated core. I don't say this is a bad idea in itself, but for this to make sense you want to have many (independent, low power) cores. I suspect that PC hardware isn't very suitable for this, because too much is shared by cores. It probably makes more sense for embedded systems, but even there it's questionable because of the kernel code only limitation.

What's the advantage of offsched compared to running a user space process at real-time priority pinned on a core with interrupts disabled?

Or in other words, what problem does your approach solve?

Another Hard real time Linux

Posted Feb 19, 2009 8:34 UTC (Thu) by razb (guest, #43424) [Link]

>If I understood your idea correctly, you basically just run some very limited kernel code on a dedicated core with all unrelated interrupts etc. disabled?

correct. but it is limited only to:
1. accessing ***vmalloc**** space ***directly*** . You can access any kmalloc'ed address directly , and access vmalloc'ed space by walking on the pages. what I mean is that you can access everything.
2. unable to kmalloc
3. unable to free memory. ( For example : kfree ).

>- 1us accurate timer.
>Standard gettimeofday gives me that. The system has plenty very accurate timers, the problem is transferring that info fast enough to where it's needed.
gettimeofday is not a timer, it is a clock. try and schedule a task to be run T microseconds from now, you will skew, and the more tasks , it will skew more.

>- Firewall/routing/etc. offloading.
>This is totally real-time unrelated. Basically it wastes one whole core on doing that instead of letting that core do also other things, and adds extra communication overhead between cores/subsystems (still need to get the packets from somewhere and tell which ones go where etc). It seems the same can be achieved by pinning the NIC interrupts to one core and >giving all network related stuff highest priority.

First, you are correct . It is real time unrelated. offsched is not just for real time use, but for many other things.having high ingest traffic means you will probably enable NAPI, and NAPI disables incoming interrupts to reduce interrupts overhead, and even with NAPI you may get your system to be jammed, and worst of all even with unrelated traffic, offsched suggests another approach of containing incoming traffic to a single or more cores. This way cpu0 , the main operating system processor, will not be at risk. Also, in regard to the waste of processors, again you are correct; but offsched is not meant to be used on your laptop, but on appliances with several cores; which , unfortunately never achieve linear speed-up.

>You basically replace standard processes with very limited kernel code running on dedicated core. I don't say this is a bad idea in itself, but for this to make sense you want to have many (independent, low power) cores. I suspect that PC hardware isn't very suitable for this, because too much is shared by cores. It probably makes more sense for embedded systems, but even there it's questionable because of the kernel code only limitation.
You can access any facility in the kernel. you can send or receive packets. and I do it on AMD-Intel machines successfully.

>What's the advantage of offsched compared to running a user space process at real-time priority pinned on a core with interrupts disabled?
you cannot run user space with interrupts disabled. So you probably meant kernel space, and it will look something like this:
cli
foo()
sti
but you will fail.
a processor must walk trough a quiescent state ; if you try it, you will have RCU starvation, and I have been there... :) . one of my papers explains that.

>Or in other words, what problem does your approach solve?
I merely suggest a different approach for real time and security for machine with several cores or hyper threading.
I am using offsched on my appliances for network work.

Another Hard real time Linux

Posted Feb 19, 2009 10:12 UTC (Thu) by i3839 (guest, #31386) [Link]

> correct. but it is limited only to:
> 1. accessing ***vmalloc**** space ***directly*** . You can access any
> kmalloc'ed address directly , and access vmalloc'ed space by walking
> on the pages. what I mean is that you can access everything.
> 2. unable to kmalloc
> 3. unable to free memory. ( For example : kfree ).

What's dangerous about accessing vmalloced space directly if it's pinned? Or did I misunderstand?

> You can access any facility in the kernel. you can send or receive
> packets. and I do it on AMD-Intel machines successfully.

Though those facilities may not access vmalloc space directly, nor allocate/free memory? Seems very fragile, because you can't know if they will in the future (assuming you audited all the code that may be executed by those facilities, which is a lot of tricky work).

How can you send and receive packets if you can't allocate the space needed for them? Not with the standard networking stack, can you?

> gettimeofday is not a timer, it is a clock. try and schedule a task to
> be run T microseconds from now, you will skew, and the more tasks, it
> will skew more.

Right, totally different, sorry. But you only run one task, so the timer is just a more efficient way of not doing anything in the meantime?

> even with NAPI you may get your system to be jammed, and worst of all
> even with unrelated traffic, offsched suggests another approach of
> containing incoming traffic to a single or more cores. This way cpu0,
> the main operating system processor, will not be at risk.

This is a generic problem: Any (user or kernel) process can use too many resources, slowing down the machine as a whole. Offsched doesn't solve that at all, except for some explicit kernel cases which are 'ported' to offsched, which is a lot of work.

realtime preemption, on the other hand, tries to solve this problem in a more generic way.

And moving networking to offsched may contain the damage to one core, but it doesn't solve the real problem, e.g. sshing into the box doesn't work quicker or better in any way. If the NIC generates more packets than can be handled, the right solution is to drop some early. Basically what you always do in an overload situation: Don't try to do everything, drop some stuff.

Now the nasty thing is that it's hard to see the difference between a DoS and just a very high load.

Besides, handling the network packets with all cores instead of one may be the difference between being DoSed and just slowed down.

> you cannot run user space with interrupts disabled. So you probably
> meant kernel space, and it will look something like this:

Bad wording on my part, sorry. No, I meant that all interrupt handlers are executed on other cores than the "special" one, and the few that would happen anyway are disabled semi-permanently. (The scheduling clock can be disabled because a rt task is running and no involuntary scheduling should happen. Easier now with dynticks though.)

Basically moving the special kernel task running on that core to a special user space task running on that core. Or at least add it as an option. Add some special syscalls or character drivers to do the more esoteric stuff and voila, all done.

> but you will fail.
> a processor must walk trough a quiescent state ; if you try it, you will
> have RCU starvation, and I have been there... :) . one of my papers
> explains that.

This problem is still there though. But it seems like a minor adjustment to RCU to teach it that some cores should be ignored, or to keep track if some cores did any RCU stuff at all (perhaps it already does that now, didn't check).

All in all what you more or less have is standard Linux kernel besides a special mini-RT-OS, running on a separate core. Only, you extend the current kernel to include the functionality of that RT-OS, and use other bits and pieces of the kernel when convenient. This is better than a totally separate RT-OS, but still comes with the disadvantages of one: Very limited and communication with the rest of the system is tricky. If done well it's a small step forwards, but why not think bigger and try to solve the tougher problems?

Another Hard real time Linux

Posted Feb 20, 2009 22:19 UTC (Fri) by razb (guest, #43424) [Link]

> Another Hard real time Linux
> [Kernel] Posted Feb 19, 2009 10:12 UTC (Thu) by i3839
>
>> correct. but it is limited only to:
>> 1. accessing ***vmalloc**** space ***directly*** . You can access any
>> kmalloc'ed address directly , and access vmalloc'ed space by walking
>> on the pages. what I mean is that you can access everything.
>> 2. unable to kmalloc
>> 3. unable to free memory. ( For example : kfree ).
>
> What's dangerous about accessing vmalloced space directly if it's
> pinned? Or did I misunderstand?
vmalloc pages are updated to the kernel master page table in the
VMALLOC area. when the processor mmu tries to access these pages it
faults. but, hey , offsched cannot fault.
kmalloc pages are static and do not require faults.
>> You can access any facility in the kernel. you can send or receive
>> packets. and I do it on AMD-Intel machines successfully.
>
> Though those facilities may not access vmalloc space directly, nor
> allocate/free memory? Seems very fragile, because you can't know if they
> will in the future (assuming you audited all the code that may be
> executed by those facilities, which is a lot of tricky work).
vmalloc memory is rarely used. it is used in audio drivers, and for
loading modules which is no more than an annoying problem.

> How can you send and receive packets if you can't allocate the space
> needed for them? Not with the standard networking stack, can you?
Recv: offsched is used for mere packet parsing . once done with the
parsing packet will be moved to kernel or dropped.
Send: pre-allocate all you need.
I am using a private UDP stack. udp is not a big deal.

>> gettimeofday is not a timer, it is a clock. try and schedule a task to
>> be run T microseconds from now, you will skew, and the more tasks, it
>> will skew more.
>
> Right, totally different, sorry. But you only run one task, so the timer
> is just a more efficient way of not doing anything in the meantime?
Only one task ? why not have both recv and transmit ? why do you think
an OS processor is fully utilized ?
Benchmarks show a speed up of 2.8 for an 8 cores machine.
>> even with NAPI you may get your system to be jammed, and worst of all
>> even with unrelated traffic, offsched suggests another approach of
>> containing incoming traffic to a single or more cores. This way cpu0,
>> the main operating system processor, will not be at risk.
>
> This is a generic problem: Any (user or kernel) process can use too many
> resources, slowing down the machine as a whole. Offsched doesn't solve
In NAPI we consume entire system computation power, in offsched we don't. I decided to call it offsched containment concept.
> that at all, except for some explicit kernel cases which are 'ported' to
> offsched, which is a lot of work.
Yes, it is a lot of work, unfortunately. currently i do not know how
much work it is to climb up a TCP stack in offsched context. Do you know of a good RT tcp stack ?
Also, rule of 80-20 proves that 20% of the code can handle 80% of the
cases,so i may find ,myself fixing only 20% of the tcp code. very much depends whether offsched will ever reach mainline.
> realtime preemption, on the other hand, tries to solve this problem in a
> more generic way.

> And moving networking to offsched may contain the damage to one core,
> but it doesn't solve the real problem, e.g. sshing into the box doesn't
> work quicker or better in any way. If the NIC generates more packets
> than can be handled, the right solution is to drop some early. Basically
> what you always do in an overload situation: Don't try to do everything,
> drop some stuff.
why a single NIC ? Many appliances if not most are shipped with an
administration interface, and a public interface.
The public is the exposed interface. if it is under attack, the entire
system is under attack , especially in a world 10G interfaces.
In offsched, we assign OFFSCHED-NAPI over 10G interface....
> Now the nasty thing is that it's hard to see the difference between a
> DoS and just a very high load.
>
> Besides, handling the network packets with all cores instead of one may
> be the difference between being DoSed and just slowed down.
who says a single OFFSCHED core is used ?
>> you cannot run user space with interrupts disabled. So you probably
>> meant kernel space, and it will look something like this:
>
> Bad wording on my part, sorry. No, I meant that all interrupt handlers
> are executed on other cores than the "special" one, and the few that
This is soft real time. user space cannot do hard real time. you can
never guarantee meeting deadlines because you are in ring 3. If you want to use a high priority kernel thread, you probably pre-allocate memory(..well... i do.. ) . so ? better use offsched.
> would happen anyway are disabled semi-permanently. (The scheduling clock
> can be disabled because a rt task is running and no involuntary
> scheduling should happen. Easier now with dynticks though.)
It is a good idea, why not wrap offsched timer with clockevents?
thanks.
> Basically moving the special kernel task running on that core to a
> special user space task running on that core. Or at least add it as an
> option. Add some special syscalls or character drivers to do the more
> esoteric stuff and voila, all done.
>> but you will fail.
>> a processor must walk trough a quiescent state ; if you try it, you
> will
>> have RCU starvation, and I have been there... :) . one of my papers
>> explains that.
>
> This problem is still there though. But it seems like a minor adjustment
> to RCU to teach it that some cores should be ignored, or to keep track
> if some cores did any RCU stuff at all (perhaps it already does that
> now, didn't check).
>
> All in all what you more or less have is standard Linux kernel besides a
> special mini-RT-OS, running on a separate core. Only, you extend the
> current kernel to include the functionality of that RT-OS, and use other
> bits and pieces of the kernel when convenient. This is better than a
> totally separate RT-OS, but still comes with the disadvantages of one:
> Very limited and communication with the rest of the system is tricky. If
> done well it's a small step forwards, but why not think bigger and try
> to solve the tougher problems?
correct. I decided to call it "hybrid system",this is because you
enjoy the stabilty of linux server and OFFSCHED. If A is the size of
your software, and B is the size of the Real time code, B/A is likely
to be small. Why mess with a big RT system for such small fraction ?
You are more than welcome to suggest other strategies.

Another Hard real time Linux

Posted Feb 20, 2009 13:25 UTC (Fri) by saffroy (guest, #43999) [Link]

Another approach is to use a real-time hypervisor: you can have real-time scheduling, (almost) full access to the bare-metal, and even (more or less) friendly APIs to communicate with the other OS. You can even have a full-featured RTOS running there.

BTW, is it reasonable to imagine the RT-preempt tree running kvm running a RTOS ?

Another Hard real time Linux

Posted Feb 20, 2009 22:26 UTC (Fri) by razb (guest, #43424) [Link]

> Another Hard real time Linux
> [Kernel] Posted Feb 20, 2009 13:25 UTC (Fri) by saffroy
>
> Another approach is to use a real-time hypervisor: you can have
> real-time scheduling, (almost) full access to the bare-metal, and even
> (more or less) friendly APIs to communicate with the other OS. You can
> even have a full-featured RTOS running there.
Funny you mention it. I actually thought of using this technology to have a solution for a single cpu machines. But it turned out that hyper-threading is good enough for offsched, so i did not try it. But i very much agree, we do not utilize the machines enough.
> BTW, is it reasonable to imagine the RT-preempt tree running kvm running
> a RTOS ?
don't know.

Interview: the return of the realtime preemption tree

Posted Feb 19, 2009 14:59 UTC (Thu) by Lovechild (guest, #3592) [Link]

Back in the day there was a very handy yum repo available. This made it trivially easy to test for users and was a good way to detect problem scenerios. I am hoping to see something like that return with this reinvigorated rt effort.

yum repo?

Posted Feb 19, 2009 19:41 UTC (Thu) by bkoz (guest, #4027) [Link]

Looking for kernel-rt as well, but don't see details on a new yum repo for the renewed realtime work.