Context and Concurrency
The abstract begins with the definition of a processor as that with an execution unit and context registers.
The context registers are dedicated to the retention of the workers PC, optionally operands, and status registers. This multiplicity of context registers allows the same execution unit, to spend time concurrently running different tasks. The abstract alludes to the supervisor thread which is able to schedule each proceeding thread.
A key to the leverage gained by running machine learning workloads is in the ability to run concurrently. Paragraph [006] notes the advantages of concurrency despite not being full “parallelism”. With the fact that current machine learning uses largely the same execution pipeline, we see the reality of this statement:
Performance of a multi-threaded processor may still be improved compared to no concurrency or parallelism, thanks to increased opportunities for hiding pipeline latency.
With large batch processing required from ML workloads, the potential for leveraging that “hidden pipeline latency” is great.