Barrel Threaded M+1
Barrel threading is the technique well known in the industry used to “hide pipeline latency”. It involves, in my understanding, quite a complex sum of components, but is effectively dilating the time between adjacent instructions by running multiple threads of instructions interleaved. This increases time between each thread’s instructions which effectively hides the latencies that would be incurred if you tried to run a single thread at max speed.
Barrel Threading is the technique of running multiple threads in a regular or weighted round robin scheduling scheme. The patent notes each thread gets its own context, which is a set of registers retaining information such as where in the program the thread is, and the values of the registers the thread is working with.
I’m unsure how unique the supervisor worker and context is, but the patent points out the need to always add an extra register context for the supervisor. The supervisor manages instructions to deal with outside memory or network cards (para [0043]). The number of threads in a single execution unit is around four to six. This is probably in line with the amount of time needed to dilate to hide the latencies, without over complicating the architecture. Note this idea of dilation as a reason is my own and not used by the verbiage of the patent.