Single-Threaded Concurrency: An Oxymoron?

March 25, 2023

First, a seemingly unrelated story.

"Why do we need CPS?"

Not child-protection services, but continuation-passing style. Once upon a freshman spring, I took a functional programming class, where I learned continuation-passing-style, among other whimsical things.

First, what is continuation-passing-style? Consider the vanilla Fibonacci function:

fun fib n
    if n <= 1 then n
    else fib(n-1) + fib(n-2)

What if we want to do some post-processing on the result of fib n? Say we wanted to compute k(fib n). The idea behind continuation-passing-style is that we pass k as an argument, aka "continuation", to fib, so that fib can call it when it's done.

Written in CPS, fib would look a little something like this:

fun fib n k
    if n <= 1 then k n
    else fib (n-1) (fn r1 =>
                        fib (n-2)
                            (fn r2 => k(r1 + r2)))

Once upon a sophomore fall, I was asked about continuation-passing-style, as well as how to write Fibonacci in continuation-passing-style. So, I wrote up my Fibonacci code in CPS like above and talked through how it worked, and all was fine and dandy. Then the question came,

"So, why do we need CPS?"

What a good question.

Wouldn't it be cleaner to write fib and k as separate functions, then call k(fib n)?

Why, yes. I thought so too, that it would be cleaner. If you want to modify your program logic, you can imagine that you'd do more surgery on the CPS-style implementation than the direct-style. And, given how involved our simple Fibonacci example became, you can imagine how much more surgery you'd do on a more sophisticated CPS program. In more sophisticated CPS programs, you can pass in multiple continuations to handle different outcomes (usually one for success and one for failure), or chain together multiple CPS-style functions, and the surgery you'd have to wrestle through for logical changes would be even heavier.

As somebody who agreed with the dissenting approach, it's a bit difficult to explain why you'd be in favor of the CPS approach. When I learned it as a student, I accepted CPS as just, "It's good to learn another style of thinking." Being asked this "aside from style" question made me wonder whether CPS could have a meaningful advantage, beyond simply being a different style.

After putting myself in CPS's shoes, I came up with the answer,

We need CPS when the function can't return to its caller. For one reason or another, the caller's state may not have been saved, but bottom line is that we can't return to our caller.

In that case, whatever it is that we wanted to do with our function result, we need to write it as a callback and pass it as an argument to the function, so that the function knows who to hand its result to.

The askers were happy with this answer, but I continued to wonder, what plausible situations would we want this? More specifically, what were plausible situations where the caller's state wouldn't be saved, and where we'd be willing to accept the CPS approach?

A Plausible Situation

Before we discuss the plausible situation

When we conceptualize a program, there are two types of operations we can make. Either an operation that uses the CPU (for calculations such as addition, bitshifts, etc.), or an operation that does not use the CPU (such as reading/writing from disk, sending/receving bytes over the network, etc.). The second type of operation we call I/O operations.

Traditionally, we think of I/O operations as blocking. For example, when we read from a file, we wait for the read to finish so that we can proceed to do things with its contents.

The motivation is that we want to make as much use of the CPU as possible. While we're blocked and waiting for our read to finish, the CPU is not in use! So, while we're waiting for our read to finish, the CPU can (and should) do work, possibly for other threads in our program, or even other processes in the operating system.

Okay, now the plausible situation

Now, suppose we've written a thread-based server. The server spins up a new thread for each connection and destroys the thread once the connection has been destroyed.

What if we have 10k connections per second? Remember that every time we spin up a thread, we need to prepare memory for its context, and that context-switching between threads also introduces costs.

Moreover, what if each thread uses more blocking operations than CPU operations? So, we could've done the work of multiple threads in one, if we interleaved the work of each thread onto one thread.

Under these two questions, we bring forth an optimized approach: Instead of having one thread per connection, we have each thread handle the work of multiple connections (when one connection is waiting on an I/O operation, we use the CPU to work on other connections). And, when a thread is done doing work for a connection, instead of destroying it, we keep it alive and let it wait for us to assign it more work. This way, we avoid the costs of setting up a fresh, new thread every time a connection comes rolling in.

Event-Based (aka "Single-Threaded") Concurrency

What's with the name "single-threaded" concurrency?

Event-based concurrency--with its even more curious nickname "single-threaded concurrency"--puzzled me for quite some time.

Isn't concurrency about doing multiple things at once? And...wouldn't single-threaded mean that we do one thing at a time? This is where I pulled out the dictionary.

Concurrency is a type of problem. Parallelism is a type of solution. Basically, you can solve the problem of dealing with lots of things at once (concurrency) by doing lots of things at once (parallelism), but you don't necessarily have to.

Oh, okay, great to clear that up! The more we know.

"Single-threaded" isn't literally single-threaded

Another clarification: "single-threaded" doesn't actually mean that we don't have multiple threads. It usually means single-threaded within the context of some problem.

In the case of the Plausible Situation described above, we apply this single-threaded concurrency approach for each thread. But within the context of the whole system, we still have multiple threads handling the connections.

When you hear people say that Python is "single-threaded," what they really mean is that one thread can execute at a time. You can spin up multiple threads, but only one thread can execute due to the GIL (Global Interpreter Lock), which only allows one thread to hold control of the interpreter at a time. There are apparently ways around the GIL, but that is beyond the scope of this discussion.

Likewise, when people say that JavaScript is "single-threaded," they mean that JavaScript's runtime environment is single-threaded. You can use Web Workers to do things in the background.

Enough semantics, how does it work?

In short, event-driven programming is, grossly simplified, comprised of an event loop like the below. There's also some sort of way of keeping track of the things to do, usually in the form of a queue. On each iteration, the event loop processes these items off of the queue, one at a time.

while true
    remove task from queue
    process task

The natural question you might have (well, I had) was, what if we block for a long time while processing an event?

while true
    remove task from queue
    process task <- what if this blocks?

while true
    remove task from queue
    process task <- this would be bad if it blocked, right?

while true
    remove task from queue
    process task <- what's stopping me from doing something
    that blocks for a million years?

Good question! There's nothing stopping us, but also, we don't want to do that.

We don't want any single task to take too much time in the loop, at least not contiguously. If a task is rather large, we'd split them into smaller tasks to put on the queue, and we'd also check for timeout. It's a similar idea to how when we run multiple processes, we regularly switch between them so that each process can make progress.

Now, another problem: recall that while we're waiting for an I/O operation of one connection to finish, we don't want the CPU to be idle. We want the CPU to do work for other connections.

With multiple threads, this is taken care of by our operating system's scheduler; it switches to a different thread that handles a different connection, and we're happy that the CPU's not idle.

However, now that we're single-threaded, we can't rely on the context-switching mechanism that comes for free. How do we do this if I/O operations are blocking? Turns out, there's something called non-blocking I/O, which, as the name suggests, we dispatch the I/O operation and move on to the next operation, without waiting for its result.

Non-blocking I/O is a little hairier to manage. We eventually have to get the result, and since we didn't originally wait for it,

we the receiver keep checking for the result of pending operations at the beginning of the event loop (select/poll/epoll), or
whoever finished our operation lets us know, in the form of some sort of interrupt

After the I/O operation finishes, what if we, the caller, want to do something with the result? Well, recall that unlike the call stack, the event loop doesn't store the state that we were in when we originally invoked the operation. This is where callbacks come in use, and so, I finally understand a use case for continuation-passing-style :)

And to go back to the original question, in what situations would we want to adopt this approach? I think generally, unless your tools constrain you to work with a single-threaded runtime, or you're trying to squeeze out performance at a large scale, it's not worth the added complexity, as CPS is harder to trace and harder to maintain. However, it's good to know that we have this as an option in case we do need it.