MCS Extensions

Prerequisites

  1. Set up your machine.
  2. You should be familiar with the seL4 Api. Otherwise do the Mechanisms tutorials

Initialising

Then initialise the tutorial:

# For instructions about obtaining the tutorial sources see https://docs.sel4.systems/Tutorials/#get-the-code
#
# Follow these instructions to initialise the tutorial
# initialising the build directory with a tutorial exercise
./init --tut mcs
# building the tutorial exercise
cd mcs_build
ninja

Outcomes

This tutorial presents the features in the upcoming MCS extensions for seL4, which are currently undergoing verification. For further context on the new features, please see the paper or phd which provides a comprehensive background on the changes.

  1. Learn about the MCS new kernel API.
  2. Be able to create and configure scheduling contexts.
  3. Learn the jargon passive server.
  4. Spawn round-robin and periodic threads.

Background

The MCS extensions provide capability-based access to CPU time, and provide mechanisms to limit the upper bound of execution of a thread.

Scheduling Contexts

Scheduling contexts are a new object type in the kernel, which contain scheduling parameters amoung other accounting details. Most importantly, scheduling contexts contain a budget and a period, which represent an upper bound on execution time allocated: the kernel will enforce that threads cannot execute for more than budget microseconds out of period microseconds.

SchedControl

Parameters for scheduling contexts are configured by invoking seL4_SchedControl capabilities, one of which is provided per CPU. The invoked seL4_SchedControl determines which processing core that specific scheduling context provides access to.

Scheduling contexts can be configured as full or partial. Full scheduling contexts have budget == period and grant access to 100% of CPU time. Partial scheduling contexts grant access to an upper bound of budget/period CPU time.

The code example below configures a scheduling context with a budget and period both equal to 1000us. Because the budget and period are equal, the scheduling context is treated as round-robin

    error = seL4_SchedControl_Configure(sched_control, sched_context, US_IN_S, US_IN_S, 0, 0);
    ZF_LOGF_IF(error != seL4_NoError, "Failed to configure schedcontext");

SchedContext Binding

Thread control blocks (TCBs) must have a scheduling context configured with non-zero budget and period in order to be picked by the scheduler. This can by invoking the scheduling context capability with the seL4_SchedContext_Bind invocation, or by using seL4_TCB_SetSchedParams, which takes a scheduling context capability. Below is example code for binding a TCB and a scheduling context.

    error = seL4_SchedContext_Bind(sched_context, spinner_tcb);
    ZF_LOGF_IF(error != seL4_NoError, "Failed to bind sched_context to round_robin_tcb");

TCB’s can only be bound to one scheduling context at a time, and vice versa. If a scheduling context has not been configured with any time, then although the TCB has a scheduling context it will remain ineligible for scheduling.

Bounding execution

For partial scheduling contexts, an upper bound on execution is enforced by seL4 using the sporadic server algorithm, which work by guaranteeing the sliding window constrain, meaning that during any period, the budget cannot be exceeded. This is achieved by tracking the eligible budget in chunks called replenishments (abbreviated to refills in the API for brevity). A replenishment is simply an amount of time, and a timestamp from which that time can be consumed. We explain this now through an example:

Consider a scheduling context with period T and budget C. Initially, the scheduling context has a single replenishment of size C which is eligible to be used from time t.

The scheduling context is scheduled at time t and blocks at time t + n. A new replenishment is then scheduled for time t+T for n. The existing replenishment is updated to C - n, subtracting the amount consumed. For further details on the sporadic server algorithm, see the original paper.

If the replenishment data structure is full, replenishments are merged and the upper bound on execution is reduced. For this reason, the bound on execution is configurable with the extra_refills parameter on scheduling contexts. By default, scheduling contexts contain only two parameters, meaning if a scheduling context is blocked, switched or preempted more than twice, the rest of the budget is forfeit until the next period. extra_refills provides more replenishment data structures in a scheduling context. Note that the higher the number of replenishments the more fragmentation of budget can occur, which will increase scheduling overhead.

extra_refills itself is bounded by the size of a scheduling context, which is itself configurable. On scheduling context creation a size can be specified, and must be > seL4_MinSchedContextBits. The maximum number of extra refills that can fit into a specific scheduling context size can be calculated with the function seL4_MaxExtraRefills() provided in libsel4.

Threads bound to scheduling contexts that do not have an available replenishment are placed into an ordered queue of threads, and woken once their next replenishment is ready.

Scheduler

The seL4 scheduler is largely unchanged: the highest priority, non-blocked thread with a configured scheduling context that has available budget is chosen by the scheduler to run.

Passive servers

The MCS extensions allow for RPC style servers to run on client TCBs’ scheduling contexts. This is achived by unbinding the scheduling context once a server is blocked on an endpoint, rendering the server passive. Caller scheduling contexts are donated to the server on seL4_Call and returned on seL4_ReplyRecv.

Passive servers can also receive scheduling contexts from their bound notification object, which is achieved by binding a notification object using seL4_SchedContext_Bind.

Timeout faults

Threads can register a timeout fault handler using seL4_TCB_SetTimeoutEndpoint. Timeout fault handlers are optional and are raised when a thread’s replenishment expires and they have a valid handler registered. The timeout fault message from the kernel contains the data word which can be used to identify the scheduling context that the thread was using when the timeout fault occurred, and the amount of time consumed by the thread since the last fault or seL4_SchedContext_Consumed.

New invocations

  • seL4_SchedContext_Bind - bind a TCB or Notification capability to the invoked scheduling context.
  • seL4_SchedContext_Unbind - unbind any objects from the invoked scheduling context.
  • seL4_SchedContext_UnbindObject- unbind a specific object from the invoked scheduling context.
  • seL4_SchedContext_YieldTo - if the thread running on the invoked scheduling context is schedulable, place it at the head of the scheduling queue for its priority. For same priority threads, this will result in the target thread being scheduled. Return the amount of time consumed by this scheduling context since the last timeout fault, YieldTo or Consumed invocation.
  • seL4_SchedContext_Consumed - Return the amount of time consumed by this scheduling context since the last timeout fault, YieldTo or Consumed invocation.
  • seL4_TCB_SetTimeoutEndpoint - Set the timeout fault endpoint for a TCB.

Reply objects

The MCS API also makes the reply channel explicit: reply capabilities are now fully fledged objects which must be provided by a thread on seL4_Recv operations. They are used to track the scheduling context donation chain and return donated scheduling contexts to callers.

Please see the release notes and manual for further details on the API changes.

Exercises

In the initial state of the tutorial, the main function in mcs.c is running in one process, and the following loop from spinner.c is running in another process:

    int i = 0;
    while (1) {
        printf("Yield\n");
        seL4_Yield();
    }

Both processes share the same priority. The code in mcs.c binds a scheduling context (sched_context) to the TCB of the spinner process (spinner_tcb) with round-robin scheduling parameters. As a result, you should see something like the following output, which continues uninterrupted:

Yield
Yield
Yield

Periodic threads

Exercise Reconfigure sched_context with the following periodic scheduling parameters, (budget = 0.9 * US_IN_S, period = 1 * US_IN_S).

    //TODO reconfigure sched_context to be periodic

By completing this task successfully, the output will not change, but the rate that the output is printed will slow: each subsequent line should be output once the period has elapsed. You should now be able to see the loop where the mcs.c process and spinner.c process alternate, until the mcs.c process blocks, at which point "Yield" is emitted by spinner.c every second, as shown below:

Yield
Tick 0
Yield
Tick 1
Yield
Tick 2
Yield
Tick 3
Yield
Tick 4
Yield
Tick 5
Yield
Tick 6
Yield
Tick 7
Yield
Tick 8
Yield
Yield
Yield

Before you completed this task, the scheduling context was round-robin, and so was schedulable immediately after the call to seL4_Yield. By changing the scheduling parameters of sched_context to periodic parameters (budget < period), each time seL4_Yield() is called the available budget in the scheduling context is abandoned, causing the thread to sleep until the next replenishment, determined by the period.

Unbinding scheduling contexts

You can cease a threads execution by unbinding the scheduling context. Unlike suspending a thread via seL4_TCB_Suspend, unbinding will not change the thread state. Using suspend cancels any system calls in process (e.g IPC) and renders the thread unschedulable by changing the thread state. Unbinding a scheduling context does not alter the thread state, but merely removes the thread from the scheduler queues.

Exercise Unbind sched_context to stop the spinner process from running.

    //TODO unbind sched_context to stop yielding thread 

On success, you should see the output from the yielding thread stop.

Sporadic threads

Your next task is to use a different process, sender to experiment with sporadic tasks. The sender process is ready to run, and just needs a scheduling context in order to do so.

Exercise First, bind sched_context to sender_tcb.

    //TODO bind sched_context to sender_tcb

The output should look like the following:

...
Tock 3
Tock 4
Tock 5
Tock 6

Note the rate of the output: currently, you should see 2 lines come out at a time, with roughly
a second break between (the period of the scheduling context you set earlier). This is because scheduling context only has the minimum sporadic refills (see background), and each time a context switch occurs a refill is used up to schedule another.

Exercise Reconfigure the sched_context to an extra 6 refills, such that all of the Tock output occurs in one go.

    //TODO reconfigure sched_context to be periodic with 6 extra refills

Passive servers

Now look to the third process, server.c, which is a very basic echo server. It currently does not have a scheduling context, and needs one to initialise.

Exercise Bind sched_context to server_tcb.

    //TODO bind sched_context to server_tcb

Now you should see the server initialise and echo the messages sent. Note the initialisation protocol: first, you bound sched_context to the server. At this point, in server.c, the server calls seL4_NBSendRecv which sends an IPC message on endpoint, indicating that the server is now initialised. The output should be as follows

Tock 8
Starting server
Wait for server
Server initialising
running
passive
echo server
Yield

The following code then converts the server to passive:

    // convert to passive
    error = seL4_SchedContext_Unbind(sched_context);

From this point, the server runs on the mcs process’s scheduling context.

Timeout Faults

But before we discuss timeout faults, we must first discuss the differences between configuring a fault endpoint on the master vs the MCS kernel. There is a minor difference in the way that the kernel is informed of the cap to a fault endpoint, between the master and MCS kernels.

Regardless though, on both versions of the kernel, to inform the kernel of the fault endpoint for a thread, call the usual seL4_TCB_SetSpace().

Configuring a fault endpoint on the MCS kernel:

On the MCS kernel the cap given to the kernel must be a cap to an object in the CSpace of the thread which is calling the syscall (seL4_TCB_Configure()) to give the cap to the kernel.

This calling thread may be the handler thread, or some other thread which manages both the handler thread and the faulting thread. Or in an oddly designed system, it might be the faulting thread itself if the faulting thread is allowed to configure its own fault endpoint.

The reason for this difference is merely that it is faster to lookup the fault endpoint this way since it is looked up only once at the time it is configured.

Configuring a fault endpoint on the Master kernel:

On the Master kernel the cap given to the kernel must be a cap to an object in the CSpace of the faulting thread.

On the Master kernel, the fault endpoint cap is looked up from within the CSpace of the faulting thread everytime a fault occurs.

Exercise:

Exercise Set the data field of sched_context using seL4_SchedControl_Configure and set a 10s period, 1ms budget and 0 extra refills.

    //TODO reconfigure sched_context with 10s period, 1ms budget, 0 extra refills and data of 5.

The code then binds the scheduling context back to spinner_tcb, which starts yielding again.

Exercise set the timeout fault endpoint for spinner_tcb.

    //TODO set endpoint as the timeout fault handler for spinner_tcb

When the spinner faults, you should see the following output:

Received timeout fault
Success!

Further exercises

That’s all for the detailed content of this tutorial. Below we list other ideas for exercises you can try, to become more familiar with the MCS extensions.

  • Set up a passive server with a timeout fault handlers, with policies for clients that exhaust their budget.
  • Experiment with notification binding on a passive server, by binding both a notification object to the server TCB and an SC to the notification object.

Getting help

Stuck? See the resources below.


Tutorial included from github repo edit