Friday, July 5, 2024

SERV 1.3


SERV the 1.3th. CPUs will never be the same again

"A new SERV version! How much smaller than the last one?"

Hate to disappoint you all, but we have now reached the point where the award-winning SERV, the world's smallest RISC-V CPU won't get much smaller. I still sometimes get ideas for how to make it smaller and then spend a week implementing something just to discover that nine out of ten times it actually made it bigger instead. But it's ok. Size matters, but the important thing is what you do with it. To quote one of the great 20th century thinkers:

"A CPU is only as good as its software and ecosystem"

 And this release brings plenty of software and ecosystem improvements. 


Seymour Cray once said "Anyone can build a fast CPU, the trick is to build a fast system". I would like to think the same goes for small CPUs and systems. With SERV being very small, care must be taken to avoid the rest of the system eating a lot of gates. One such system-level improvement was made by slightly rescheduling RF accesses which makes it possible to use one combined single-port SRAM for RF, code and data. Looking at some SRAM macros I have at hand the single-port versions are around 45% smaller area than the dual-port ones. Since the area for most SERV systems is typically dominated by memory size, this means a vastly lower total system cost from just a small change inside SERV.

The aforementioned memory improvement might be missed by people implementing SERV since the required changes are outside of the SERV core itself. Having now done a bunch of systems and subsystems around SERV, I have come to notice some typical optimizations, a few recurring patterns and things that get implemented over and over again. To make everyone's life a little easier I created Servile, a convenience building block containing SERV and some surrounding logic that is used by most systems built around SERV.

Once that was in place, I restructured the Servant, Serving and Subservient components around Servile and could both remove a lot of custom logic and get some features for free like debug printouts and optional extensions. With a growing number of systems, the documentation has been overhauled to collect them all in the Reservoir to make it easy to pick and choose the right foundation for building your own SERV-based system.


The reservoir has a SERV-based subsystem for your every need

As usual, a new release sees support for a range of new FPGA boards. This time  we have Arty S7-50, PolarFire Splash Kit, Machdyne Kolibri, GMM-7550, Alchistry AU, ECP5 Evaluation board and Terasic DE1 SoC - all provided by kind contributors. Big up mi bredren!


I have claimed many times that SERV is so simple that it is possible to simulate it at almost the same speed as we can run it on a chip. And now it is much easier to check if this is really the case. A new parameter in the verilator testbench called cps (cycles per second) keeps track of how many simulated cycles we can run each real second. This information is printed to a file to be viewed. If you are running the Servant SoC through FuseSoC, you can enable it by passing --cps as an extra argument.

SERV runs at 2.8MHz in simulation on my 7-year old laptop. Love to see how fast someone can make it go on a beefier machine

Another convenient simulation feature is the addition of a PC tracing parameter which just dumps the PC to a binary file after each instruction. In order to make use of this data I have put together some custom tooling for profiling and tracing. For example, I can feed the trace data and the ELF file that was executed during the simulation into a Python script and get a list of how much time was spent in each C function. This in turn can be used to either optimize the software or potentially add some accelerator to SERV.

SERV running a Zephyr demo application. Apparently it spends almost 25% of its time printing to the UART

While this custom tooling is handy, I would ideally not have to create any of it myself. It would be much better to export the PC trace to some already existing format which already has reasonable tools for profiling and tracing. I'm sure something like this must exist, but I haven't been able to find anything. Tips are most welcome!


As mentioned in the introduction, we need software to actually have any use of our CPU. When it comes to software on SERV, users tend to run bare-metal or use Zephyr. For users of the latter, I can happily report that the Servant Zephyr BSP is upgraded to support Zephyr 3.5 which is a big step up from the previous version 2.4.


Despite years of verification, a bug was spotted in SERV recently. The trap signal, indicating an exception, was released too early, before the MSBs of the PC had been written. In practice meant that jumps in and out of the exception routines could end up in the wrong place if the code was place near the end of the 32-bit address space. Hopefully this hasn't affected any users, since it's unlikely that a software running on SERV would fill an address space large to trigger this. Still, it was a bug and it has been fixed now.

Looking forward

SERV now has two larger siblings called QERV and HERV, as has been hinted before,  which implements 4- and 8-bit wide datapaths instead of SERV's 1-bit. They currently reside in a different repository, but the idea is to ultimately integrate them fully in the SERV codebase. As a first step, each module inside SERV will receive a parameter to set the width of the datapath. Most of the work is done, but there are some minor things remaining.

You might wonder why there is no 2-bit wide version, also known as DSRV, for Double SERV. The reason for that is that we started out with QERV and discovered that the added overhead was so small, so it didn't seem to make any sense doing a version that would probably be roughly the same size as QERV but half as fast.

If you can afford a few more gates you can get more bang for the bucks

Another thing that I hope to write more about in the future is the version of SERV that was submitted to the 7th shuttle run of the Tiny Tapeout project. Even though I know about some really interesting implementations of SERV, I'm typically not allowed to say that much about them, and as with all open source projects there are likely many more uses that I'm completely unaware of. This one however, I can show to everyone.

Area-wise, it was a bit tight but using two of the allocated slots, it was possible to fit SERV, an XIP SPI Flash controller and 5 GPRs. The project was dubbed Underserved, which I thought was fitting as Tiny Tapeout caters to the underserved hobbyist market.

There are already plans for more SERV features and I have some really, really interesting news I'm bursting to share, but that will unfortunately have to wait a little bit longer. Stay tuned!