Apologies if self-promo is not allowed.
This is a project I’ve been hacking on-and-off on for some years. The main point of intrigue is that the kernel doesn’t use traditional message passing for inter-process communication (IPC), instead allowing threads to be moved between processes. Compared to a typical synchronous message passing implementation, thread migration can be more parallel (several threads can enter the same process at the same time versus messages having to wait in a mailbox to be read) while being roughly as performant single-thread. I’ve benchmarked my kernel on real RISC-V hardware to roughly be on par with seL4. Thread migration could potentially also be significantly sped up with hardware: https://dl.acm.org/doi/10.1145/3064176.3064197
Personally, I also find that thread migration has some soft benefits, like easy userspace scheduling, being easy to implement and arguably allowing better resource distribution, that are more difficult to demonstrate but are probably worth bringing up. The major downside is that some of the flexibility of message passing is lost, message passing between networked computers is straightforward and asynchronous messages can allow the sender thread to do other things while it waits, which both require extra effort with thread migration (but are at least doable).
The idea itself is not exactly new, the first implementation of thread migration I could find is from 1994, in an experimental PA-RISC port of the Mach kernel, where the migrating thread model was retrofitted to a remote procedure call (RPC) mechanism that was previously built on message passing: https://www.usenix.org/conference/usenix-winter-1994-technical-conference/evolving-mach-30-migrating-thread-model
According to the article, thread migration made the RPC mechanism ~5x faster while requiring only about half the source lines of code. Unfortunately, the port never really went anywhere and Mach as a whole kind of disappeared. A form of it still lives on in GNU Hurd, but only the message passing parts.
Beyond that, I could really only find one other semi-active project using migrating threads, the COMPOSITE project: https://www2.seas.gwu.edu/~gparmer/publications/ospert10.pdf COMPOSITE was designed for thread migration, but targets a more niche use-case of predictable systems, which it argues thread migration is better suited for than message passing. Some design decisions place limitations on how the kernel can be used, for example each process is required to allocate execution stacks for however many threads the process is willing to host at the same time, which is not really an issue if you have full control over the system but can cause slowdowns and extra resource usage if you’re trying to use it as a general-purpose operating system.
I have a blog post which is something of a technical overview if anyone is curious: https://metanimi.dy.fi/blog/kmi/
The kernel itself is in fairly good shape, but I don’t really have a proper userspace for it quite yet, sorry. That’s something I’m semi-actively working on.



I’m not really an OS guy, so forgive me if this question has an obvious answer. When a thread migrates, it keeps its stack and register, thus any data contained within this can be used in the destination process (correct me if I’m wrong). Thus sending a message could be as simple as migrating a thread and having that thread copy data from its registers or stack memory to the current process’s memory space. However, how does the thread find process-specific addresses and handles (e.g. a mutex)? For example, I’m picturing a scenario where you are implementing an MPI library and want to use thread migration to send (small) messages from one local process to another. The thread orchestrating the send simply loads the data from memory and migrates, but how will it know where to store the data to? Would there need to be a data structure stored in a fix offset in memory that contains the destination address of the receiving process?
Only a slight correction, at least my implementation makes stack used by previous processes completely inaccessible. If the stack from previous processes was writable, then a bug in the current process could overwrite important data in the previous process and a the bug would spread across processes, which is not what we want. Theoretically, I guess you could just map the stack pages read-only instead of completely inaccessible, but then you might be sharing security-critical data, so at least for now I require that all data be passed via registers. For larger data amounts, shared memory can be used, but that of course increases the complexity of a given implementation somewhat. This would be a limitation that falls under the ‘loss of flexibility’ that I mentioned in the post.
You’d use global variables of some kind, yeah. Since each thread enters at
_start, their addresses are known by the compiler so you don’t really need anything special, it’s largely just regular multithread programming. For instance, in the filesystem server that I’m writing, I just have big global array that’s as large as the maximum number of concurrent processes (which is known statically) that maps the process ID to a list of open file handles. Since_startknows which process the migration came from (the process ID and thread ID are set by the kernel) the lookup to file handles is practically speaking just one atomic load of extra overhead.Thank you for the clarification. Very cool project!