Tales from Beyond the Register Map: 2022

Sunday, October 30, 2022

Don't copy that IP

Making copies of other people's IP is a terrible thing! And it's a really big problem in the EDA world.

Oh, I'm not talking about the legal aspects. That's something I happily leave to the lawyers. No, I'm talking about making copies of source code instead of referencing the original code. It's a big problem because suddenly you have two different versions, and the chances that a fix that was done in one version will reach the other version are slim to none. When I was entering the world of open source silicon around 2010, this was how things were normally done. And my first reaction was to try and upstream all fixes to IP cores that I found in various projects that were using those cores. My second reaction was to try and find a more sustainable way to avoid this problem in the first place. And solving this problem was one of the initial driving forces behind FuseSoC, and it still is. So it makes me very very sad when, twelve years later, I still find random copies of IP cores, all with various subsets of fixes applied to them.

When we started working on the OpenLANE Edalize backend, one important thing was to have as many example designs as possible, both to make sure the backend was flexible enough to cover all use-cases and also to have a good set of examples for anyone interested in adding FuseSoC support for their own cores. We found a number of example designs in the OpenLANE repository which were used to test OpenLANE itself. Great! ...except... it turned out most of the examples were cores or parts of cores copied from various places. So we did what every sensible person (not really) would do. We decided to upstream all those example designs, so that the OpenLANE support and any other fixes would benefit all users of that IP core.

Bringing' it all back home

At the time we started looking at this, there were 32 different example designs. Our first job was to find out where on earth all these came from. That took a fair amount of detective work, but in the end we found what we believe is the proper upstream for all, except perhaps for one where we are a bit unsure. In that case we chose to file a PR against OpenLANE since that was the closest we could get to an upstream. A few designs were dropped from the OpenLANE examples while we were working on this, in which case we also chose to ignore them.

Once we had identified the origin, we tried to figure out how they worked and then set off to add FuseSoC support for each and every core. At the minimum we added a target for linting and a target for building with OpenLANE through the Edalize backend. In the cases where we found testbenches, we also added support for running those, with a few exceptions where that required tools we didn't have access to.

In addition to adding FuseSoC support, we also added support for running the targets as GitHub CI actions, so that every new commit to the project would automatically lint, build a GDS with OpenLANE and potentially run some testbenches.

And finally we packaged it all up and sent a humongous number of pull requests to different projects with detailed instructions how they could use this. Many of these pull requests have been accepted, but not all. There's not much more we can do about that however. If you are curious, it's possible to check the progress here

So, what does this really mean? Did we make the world a better place for you and for me and the entire human race? I like to think so. At least we didn't make it worse. There are a couple of very real benefits to this work.

Adding FuseSoC support makes it eaiser for other users to use these cores in their own projects
Adding testbench targets makes it easy for other people to check the cores work as expected
Adding a lint target makes it easy to check code quality. Far from all of the designs we encountered pass the lint check
On a few occasions, fixes had been made in the copies. These were fed upstream to benefit all users of those cores
The CI actions makes it easy to check nothing breaks on future updates of the cores
We could test the Edalize OpenLANE backend on a number of different designs to ensure it was flexible enough to handle them all
We now have a large pool of example designs for anyone interested in doing the same for their cores
And finally, we have seen that some of the maintainers whose cores we added support for, have started doing the same on their other cores, which is fantastic to see.

Now, our hope is of course that you too will be bitten by the FuseSoC fever and add support for your cores too so that we can keep growing the ecosystem of FuseSoC-compatible cores, which in turn will help the EDA tool developers improve their tools.

This work was sponsored by a grant from NLNet Foundation

From simulation to SoC with FuseSoC and Edalize

As you probably also know by now there is a fully open source ASIC toolchain called OpenLANE + a 130nm PDK from SkyWater Foundries together with a program called OpenMPW which allows anyone to produce ASICs from their open source RTL designs completely for free.

And regular readers probably also know that NLNet Foundation has kindly sponsored an Edalize backend for this toolchain, so that users can easily run their designs through this toolchain just as they would do with a simulation environment or an FPGA implementation.

But wouldn't it be great if there also was a good example design to show how this is actually accomplished? And wouldn't it be good if this example design was small enough to quickly run through the tools but still complex enough to showcase several of the features that users might want to use in their own designs.

Well, what better design could exist than a small SoC based on the award-winning SERV, the world's smallest RISC-V CPU? This tutorial will show we can take an existing building block, in this case SERV, turn it into an ASIC-friendly SoC, run it in simulation, make an FPGA prototype and finally have it manufactured as an ASIC. All using FuseSoC and Edalize to hide most of the differences between these vastly different tool environments. Afterwards, you should be able to use the same process and thinking to turn your own designs into FuseSoC packages that can be used with different tools and easily reused in your own and other people's designs.

From design to simulation to FPGA to ASIC. FuseSoC+Edalize helps you all the way

Let's start by looking at what we can do to make an ASIC-friendly SoC out of our good friend SERV.

Creating an ASIC-friendly SoC

For FPGA implementations, there is a reference design for SERV called the Servant SoC. That one is unfortunately not so well suited for ASIC implementation for two reasons. The main one being that it relies on the data/instruction memory being preinitialized with an application that we can run, which is not something we can easily support in an ASIC. The other thing is also memory-related. The Servant SoC uses a separate memory for RF and for instruction+data but SERV supports using a a single shared memory for that, which will allow for an even smaller footprint.

Servant SoC - the reference platform for running SERV on FPGA

So with these things taken into consideration, we look at how to design our SoC called Subservient. An obvious inspiration for this is the Serving SoClet which uses this aformentioned shared memory setup. For the subservient SoC however we need to move the actual RAM macro out of the innards of the design so that we can instead connect the RAM and the subservient SoC as hard macros. Related to this we also introduce a debug interface that we can use to write to the memory while SERV is held in reset since we can't rely on preinitialized memory content.

The Serving SoClet. A building block for making tiny SoCs. Just add peripherals.

We can reuse the arbiter and mux from the serving SoClet as is (reuse is good!), but use a slightly modified version of the RAM IF that basically just introduces a Read Enable for the RAM. The debug interface is just a dumb mux that assumes there aren't any memory accesses in flight when the switch is toggled. The RAM interface module that turns the internal 32-bit accesses to 8-bit accesses towards the RAM is the only moderately complex addition, and this is a great thing! It means that we have been able to reuse most of the code and have less untested code to deal with. The resulting architecture looks like this.

The core of the Subservient SoC. Made for portable ASIC implementation. Needs RAM, peripherals and a way to initialize the instruction/data memory

The subservient_core exposes a Wishbone interface where we can hook up peripheral devices. Since we want the simplest thing possible, we just terminate the peripheral bus in a 1-bit GPIO controller and make a wrapper. The greyed out blocks are potential future additions.

The simplest I/O configuration for Subservient. Just a single output pin

And for someone looking at it from the outside, it looks like this.

Now that we have a design we want to do some testing. For this, we create a testbench that contains a SoC hooked up to a model of the OpenRAM macro that we intend to use in the SkyWater 130nm OpenMPW tapeout and a UART decoder so that we can use our GPIO as a UART. We also add some hastily written lines of code to read a Verilog hex file and write the contents to memory through the debug interface before releasing the SoC reset. This task would be handled by the Caravel harness in the real OpenMPW setup.

Subservient testbench. Starts by loading a program to the simulated RAM through the debug interface and then hand over to SERV to run.

Adding FuseSoC support

We are almost ready to run some simulations with FuseSoC. The last thing remaining is just to write the core description file so that FuseSoC knows how to use the core. Once you have a core description file, you will be able to easily use it with almost any EDA tool as we will soon see. Having a core description file is also an excellent way to make it easier for others to use your core, or conversely, pull in other peoples cores in your design.

We begin with the CAPI2 boilerplate.

CAPI=2:

name : ::subservient:0.1.0

Next up we create filesets for the RTL. Filesets are logical groups of the files that build up your design. You can have a single fileset for your whole design or a fileset each for your files. The most practical way is often to have a fileset for the core RTL, one for each testbench, and separate ones for files that are specific for implementation on a certain FPGA board etc. This is also what we will do here.

Filesets is also where you specify dependencies on other cores. In our case, subservient_core instantiates components from the serving core (or package to use the software equivalent of a core) so we add a dependency on serving here. The serving core in turn depends on the serv core. This means we don't have to care about the internals of either serving or SERV. Their respective core description files take care of the details for us. And if some larger project would want to depend on the subservient SoC, the core description file we are about to write will take care of that complexity for them.

The testbench fileset uses an SRAM model available from a core called sky130_sram_macros and also our trusty testbench utility core, vlog_tb_utils. Finally we add a couple of test programs in Verilog hex format.

filesets:
  core:
    files:
      - rtl/subservient_rf_ram_if.v
      - rtl/subservient_ram.v
      - rtl/subservient_debug_switch.v
      - rtl/subservient_core.v
    file_type : verilogSource
    depend : [serving]

  mem_files:
    files:
      - sw/blinky.hex : {copyto : blinky.hex}
      - sw/hello.hex  : {copyto : hello.hex}
    file_type : user

  tb:
    files:
      - tb/uart_decoder.v
      - tb/subservient_tb.v
    file_type : verilogSource
    depend : [sky130_sram_macros, vlog_tb_utils]

  soc:
    files:
      - rtl/subservient_gpio.v
      - rtl/subservient.v
    file_type : verilogSource

We then define the user-settable parameters to allow us to easily change test program from the command-line, experiment with memory sizes and decide whether the GPIO pin should be treated as a UART or a regular I/O pin.

parameters:
  firmware:
    datatype : file
    description : Preload RAM with a hex file at runtime
    paramtype : plusarg

  memsize:
    datatype    : int
    default     : 1024
    description : Memory size in bytes for RAM (default 1kiB)
    paramtype   : vlogparam

  uart_baudrate:
    datatype : int
    description : Treat gpio output as an UART with the specified baudrate (0 or omitted parameter disables UART decoding)
    paramtype : plusarg

Finally we bind it all together by creating a simulation target. Targets in the core description files are end products or use-cases of the design. In this case, we define a target so that we can run the design within a testbench in a simulator. The targets is also where we reference the filesets and parameters that were defined earlier. This allows us to use different subsets of the core for different targets. We also throw in a derivative sim_hello target as a shortcut to run the other test program, a default target and a lint target so that we can get quick feedback on any potential design mistakes.

targets:
  default:
    filesets : [soc, core]

  lint:
    default_tool : verilator
    filesets : [core, soc]
    tools:
      verilator:
        mode : lint-only
    toplevel : subservient

  sim: &sim
    default_tool: icarus
    filesets : [mem_files, core, soc, tb]
    parameters :
      - firmware
      - memsize
      - uart_baudrate
    toplevel : subservient_tb

  sim_hello:
    <<: *sim
    parameters :
      - firmware=hello.hex
      - memsize=1024
      - uart_baudrate=115200

With this in place we can now run

$ fusesoc run --target=sim_hello subservient

which will run the testbench with the hello.hex program loaded and the GPIO output interpreted as a UART with 115200 baud rate. The output should eventually look something like this.

Running our first FuseSoC target on Subservient

We can run any other program like this, for example the blinky example which toggles the GPIO pin on and off, by supplying the path to a verilog hex file containing a binary

$ fusesoc run --target=sim subservient --firmware=path/to/subservient/sw/blinky.hex

We won't go into detail on how to prepare a Verilog hex file, but there's a Makefile in the subservient sw directory with some rules for to convert an elf to a hex file.

And as usual, you can list all targets with

$ fusesoc core show subservient

and get help about all available options for a specific target by running

$ fusesoc run --target=<target> subservient --help

Prototyping on FPGA

All right then. Simulations are nice and all, but wouldn't it also be good to have this thing running on a real FPGA as well? Yes, but let's save ourselves a bit of work. In the simulation we could load a file through the debug interface from our testbench. In the ASIC version, that task will be handled by someone else. But to avoid having to implement an external firmware loader in Verilog for the FPGA case, we use the FPGA's capability of initializing the memories during synthesis instead. Remember, always cheat if you have the option!

What board we want to use does not really matter. We can probably just take any random board we have at hand and add the proper pinout and clocking. I happened to have a Nexys A7 within arm's reach so let's go with that one.

We put the FPGA-specific clocking in a separate file so that we can easily switch it out if we want to run on a different board. Next up we add an FPGA-compatible SRAM implementation that supports preloading. We can steal most of the logic for the memory and clocking as well as a constraint file from the Servant SoC (Remember, always steal if you have the option!). Finally we add the subservient SoC itself, connect things together and put them in a new FPGA toplevel like this.

FPGA-friendly version for quick prototyping of our ASIC-friendly SoC

We're now ready to build an FPGA image but it's probably a good idea to run some quick simulations first to check we didn't do anything obviously stupid. All we need for that is a testbench, and since the Subservient FPGA SoC is very similar to the Servant SoC from the outside, we just take the Servant testbench and modify it slightly. We also put in a clock generation module that just forwards the clock and reset signals. With the code in place we add the necessary filesets

  fpga:
    files:
      - rtl/subservient_generic_sram.v : {file_type : verilogSource}
      - rtl/subservient_fpga.v : {file_type : verilogSource}
    
  fpga_tb:
    files:
      - tb/subservient_fpga_clock_gen_sim.v : {file_type : verilogSource}
      - tb/subservient_fpga_tb.cpp : {file_type : cppSource}

and the target for the fpga testbench

  fpga_tb:
    default_tool : verilator
    filesets : [core, soc, mem_files, fpga, fpga_tb]
    parameters: [firmware, uart_baudrate=46080]
    tools:
      verilator:
        verilator_options : [-trace]
    toplevel: subservient_fpga

to the core description file. Note that we're using a baud rate of 46080. That's because we define the testbench to run at 40MHz instead of 100MHz (for reasons that will become clear later) and then we must scale down that baud rate to 40% of 115200. Let's give it a shot by running

$ fusesoc run --target=fpga_tb subservient

Works in simulation! This increases our confidence in the FPGA implementation

Works like a charm. Now we have more confidence when going to FPGA.

There are a couple of things that differ between our simulation and an actual FPGA target. Instead of a testbench we need to add an FPGA-specific clocking module and a pin constraint file for our board. So let's put them in a new fileset and then create a target referencing these files and telling the EDA tool (Vivado) what FPGA we're targeting.

filesets:
  ...
  nexys_a7:
    files:
      - data/nexys_a7.xdc : {file_type : xdc}
      - rtl/subservient_nexys_a7_clock_gen.v : {file_type : verilogSource}

targets:
  ...
  nexys_a7:
    default_tool: vivado
    filesets : [core, soc, mem_files, fpga, nexys_a7]
    parameters: [memfile]
    tools:
      vivado: {part : xc7a100tcsg324-1}
    toplevel: subservient_fpga

Voilà! Now we can run our FPGA build with

$ fusesoc run --target=nexys_a7 subservient

If everything goes according to plan and we have the board connected, it will be automatically programmed. Using our favorite terminal emulator and setting the correct baud rate should then give us the following output.

Wow! It's just like in the simulation...which is kind of the idea

Alright then, simulation and FPGA is all good, but our original idea was to put this in an ASIC. Sooo....how do we do that?

Making an ASIC target

The good news is that we have actually done most of the work already, and this is very much the point of FuseSoC and Edalize. It allows you to quickly retarget your designs for different tools and technologies without having to do a lot of tool-specific setup every time. Now, OpenLANE is a bit special compared to other EDA tool flows so there will be a couple of extra bumps in the road, but hopefully these will be smoothed out over time.

Since we have an Edalize backend for the OpenLANE toolchain already, all we need to to is to add any technology- and tool-specific files and invoke the right backend. OpenLANE can be operated in several different ways, but the way that Edalize integration currently works is by adding TCL files with OpenLANE configuration parameters that will be picked up by OpenLANE and then Edalize assumes it will find an executable called flow.tcl and that a usable PDK is installed and can be found by OpenLANE.

So on the tool configuration side, all we need to do is to add a TCL file containing the parameters we want to set. And the only things we are strictly required to put into this file is information about the default clock signal and target frequency.

set ::env(CLOCK_PERIOD) "25"
set ::env(CLOCK_PORT) "i_clk"

There are a million other parameters that can be set as well to control size, density and different routing strategies so I encourage everyone to read the OpenLANE docs and experiment a bit, but for this time we just add the aforementioned settings to a tcl file and add a fileset and target.

filesets:
  ...
  openlane:
    files:
      - data/sky130.tcl : {file_type : tclSource}

targets:
  ...
  sky130:
    default_tool: openlane
    filesets : [core, soc, openlane]
    parameters :
      - memsize
    toplevel : subservient

Seriously, it's not harder than that. We're now ready to run OpenLANE and have our GDS file. The thing is, though, that it can be a bit finicky to install the toolchain and the PDK. Building the PDK from sources using the official instructions requires downloading gigabytes of Conda packages, keeping track of a number of git repositories and an somewhat convoluted build process. There are several disjointed attempts at providing a pre-built PDK but at the time of writing there didn't seem to be an agreement on how to do that. Also, the OpenLANE toolchain itself is a bit special in that the recommended way of running it is from a Docker image rather than install it directly. So, with these two facts at hand we decided to simply prepackage a Docker image with OpenLANE and a PDK included. This image gets updated from time to time, but in general it's a bit behind the upstream version. But that's totally fine. There's seldom any need for running the absolutely latest versions of everything.

Launcher scripts

But how then do we run OpenLANE from the Docker image? For that we use another one of Edalize's nifty features, launcher scripts! Normally, Edalize calls the EDA tools it wants to run directly but we can also tell Edalize to use a launcher script. A launcher scripts is a script (or any kind of program, really) that gets called instead of the EDA tools. The launcher script is also passed the original command-line as parameters so that it can make decisions based upon what Edalize intended to originally run.

In this case, Edalize wants to run flow.tcl -tag subservient -save -save_path . -design . when we invoke the OpenLANE backend but telling Edalize to use a custom launcher that we choose to call el_docker, the command-line instead becomes el_docker flow.tcl -tag subservient -save -save_path . -design .

If we just want the launcher script to do something special when OpenLANE is launched, then we simply check if the first argument is flow.tcl. In that case we do something special, or otherwise call the original command-line as usual. Simple as that.

So what special magic do we want to do for OpenLANE? We want to run flow.tcl from our OpenLANE+PDK Docker image and at the same time make our source and build tree available within the image. The whole command in its simplest forms looks something like this when invoked from the Edalize work root

$ docker run -v $(pwd)/..:/src -w /src/$(basename $(pwd)) edalize/openlane-sky130:v0.12 flow.tcl -tag subservient -save -save_path . -design .

We could make this script a bit nicer if we want so that we run as the ordinary user instead of as root, and so on, but this has in fact already been taken care of. The aforementioned el_docker launcher script already exists and is installed together with Edalize. And not only does it support running OpenLANE through Docker but also a whole bunch of other tools like verilator, icarus, yosys, nextpnr and so on. So you can just as well use this script for simulation and FPGA purposes if you for some reason don't want to natively install all these EDA tools. The proprietary tools are for obvious reasons not runnable this way since the EDA vendors would probably get very, very angry if we put their precious tools in containers and published them for everyone to be used. Hopefully we can completely avoid proprietary tools some day, but not yet. Anyway, so how do we tell Edalize to use a launcher script? Currently, this is done by setting the EDALIZE_LAUNCHER environment variable before launching FuseSoC (which launches Edalize).

So, our final command will be:

$ EDALIZE_LAUNCHER=el_docker fusesoc run --target=sky130 subservient

And with that, my friends, we have built a GDS file for the Subservient SoC that we can send to the fab and get real chips back. And this we did, but that's for another day. So let's just lean back and take in the beauty of the world's smallest RISC-V SoC, created by open source tools, and think a bit about how incredibly easy it was thanks to FuseSoC and Edalize (and of course NLNet who funded the Subservient SoC and integration of OpenLANE and Edalize).

And now, it's your turn to do the same with your own designs. Good luck!

Friday, October 14, 2022

SERV: The Little CPU That Could

Big things sometimes come in small packages. Version 1.2.0 of the award-winning SERV, the world's smallest RISC-V CPU has been released and it's filled to the brim with new features.

Historically, focus has always been on size reduction, making it ever smaller, and that has paid off. It's now about half the size of when it was first introduced. But at this point we're not really getting much smaller, and frankly, that's fine. It still is the world's smallest RISC-V CPU by a good margin.

Resource usage over time for the SERV default configuration

Mimimum SERV resource usage for some popular FPGA families

So this time we focus on features instead. Most notably we have support for two major ISA extensions, both often requested by users, but there are also a number of other new features as well. Let's take a look at all of them, shall we?

M extension

Multiple SERV cores can share one MDU (or perhaps other accelerators)

As part of Google Summer of Code 2021, Zeeshan Rafique implemented support for the M ISA extension. This was done through an extension interface that allows the MDU (Multiplication and Division Unit) to reside outside of the core and potentially be shared by several SERV cores in the same SoC, or integrated into other RISC-V cores for maximum reusability. We hope to also see other accelerators use the extension interface soon. Zeeshan's report about the project to add the M extension can be read here

C extension

Two extra blocks to save a lot of memory

As part of the Linux Foundation Mentorship program Spring 2022, Abdul Wadood has implemented support for the C ISA extension. The C extension has been the most requested feature of SERV. Since SERV is so small, the memory typically dominates the area and the C extension has the potential to allow for smaller memories and by extension a smaller system. Abdul's report about the project to add the C extension can be read here

Documentation

Pictures and words. The SERV documentation has it all

Documentation continue to improve with more gate-level schematics, written documentation, source code comments and timing diagrams towards the goal of becoming the best documented RISC-V CPU. There are always room for improvements, but now all modules as well as the external interfaces are at least documented.

Bug fixes

A bug that caused immediates to occasionally get the wrong sign (depending on which instruction was executed prior to the failing one) was found and fixed.
Model/QuestaSim compatibility has been restored after accidentally being broken after the 1.1.0 release and a few more resets have been added after doing extensive x-propagation analyses.

Compliance tests

Version 2.7.4 of the RISC-V compliance test suite is now supported over the older 1.0 release. A Github CI action has also been created to test the compliance test suites with all valid combinations of ISA extensions for improved test coverage.

Servant

Moving outside of the SERV core itself we have Servant, the simple SERV reference SoC. Servant isn't the most feature rich SoC, instead focusing on simplicity, but it has at least the bare minimum to run the popular Zephyr Real Time Operating System. During this development cycle Servant, has gained support for five new FPGA boards: (EBAZ4205, Chameleion96, Nexys2 500, Nexys2 1200 and Alinx AX309)
With this, the total number of officially supported boards is 26.

ViDBo support

Got no FPGA board? Don't worry. ViDBo has you covered.

Support for the Virtual Development Board protocol has been added, making it possible to interact with a simulation of an FPGA board running a SERV SoC, just as it would be a real board. This allows anyone to build software for SERV and try it in simulation without access to a real board.

OpenLANE support

A few thousand transistors needed to build the world's smallest RISC-V CPU

Thanks to the FOSSi OpenLANE toolchain, SERV can be implemented as an ASIC with the SkyWater 130nm library. It has also been taped out as part of the Subservient SoC but at the time of this release the chips have not yet returned from the fab. Thanks to the combination of a FuseSoC, a FOSSi ASIC toolchain and publicly available CI resources however, a GDS file of SERV is now created on every commit to the repository, making the ASIC process about as agile as it can get.

The future

So what lie as ahead for our favorite CPU? Well, there were a number of features that I decided to postpone in order to get a new version released. There were plenty of big features anyway

DSRV & QERV

Smaller or faster. The choice is yours.

SERV is very small. That's kind of the point. However, many times it's preferable to use slightly more resources if it also means faster. Changing to a 2-bit datapath would make SERV twice as fast, while likely using far less than twice as much logic. Moving to a 4-bit datapath would halve the CPI once again. The 2- and 4-bit versions are tentatively called DSRV, for Double SERV, and QERV, for Quad SERV. I think you get the point, but here's a video explaining the idea in slightly more detail if anyone is interested in taking up this as a project.

While we could theoretically go for 8-, 16- and 32-bit versions as well, there are some internal design choices that would make this more complicated and it would probably be a better idea to design a new implementation from scratch for 8+ bit datapaths.

Decoder generator

Another big thing that I had hoped to finish but decided to push to a future version is autogenerated decoder modules. I have spent a lot of work on it and I think it's a really interesting idea that might even end up as a paper at some point. Unfortunately I don't have the time to finish it up right now. So, what's it all about? Well, it's a bit complicated so I think it's best to leave the details to a separate article (which I hope to find time to write soon), but the overall idea is to take advantage of the fact that many internal control signals are irrelevant when executing some class of instructions, so that we can combine them with other control signals that are irrelevant for other classes of instructions.

Servant

I was also hoping to add some improvements to Servant, the SERV reference platform but couldn't find time for that either. I think that's ok though. Anyone who's looking for a more advanced platform to run SERV should go for Litex instead which supports SERV already. Fellow RISC-V ambassador Bruno Levy even demonstrated that it was possible to run DooM on SERV through Litex, I might be a bit partial, but it's a beautiful thing to see this tiny thing take on such a big task and as a proud parent I look at it and wonder is there anything this little CPU can't do with a little encouragement.

Saturday, October 1, 2022

It's time to to thank UVM and say goodbye

UVM has been a massive success. There's no doubt about that. For the first time it showed the chip industry the benefits of having a common framework. You can hire directly for UVM skills. Vendors provide UVM models for interfacing their IP. There are tools for generating UVM registers and other boilerplate code. There is training available and forums for asking UVM-related questions.

But it wasn't always like that. I remember when UVM was still not widely adopted. A lot of companies said "Weeeell, I'm sure this UVM thing is very good for other companies, but you know, our needs are a bit special". It's funny how all those special needs just suddenly disappeared when the economic benefits of not having to deal with your own framework and being able to easily hire people and get VIP from vendors became apparent.

So UVM has been a massive success. It has become so ubiquitous so that many people in the industry seem to believe it has some magical properties and that it's the only way to verify chips. But, frankly speaking, it's not really that good of a framework. It's clunky and suffer from a lot of legacy. Many companies I'm talking with don't actually use it as is but have written some custom framework on top, and you can find plenty of tools to generate UVM, which in the end means we end up with a boatload of incompatible framework generators instead. But the biggest issue is that it's written in SystemVerilog.

Oh no! Will this turn in to one of those language wars again? Maybe, but we can't ignore the fact that there probably goes 1000 Javascript, Python or Java developers for each Verilog coder. (System)Verilog (or VHDL for that matter) barely scrapes the bottom of the top 50 most popular language lists. "Nonsense!", I hear my fellow chip design engineers mumble, "Everyone knows Verilog". Well, there's a word for that. Survivor bias. Everyone in the semiconductor industry knows Verilog because those who couldn't stand the language just went elsewhere. And this is a huge problem for the industry. On top of an aging demographic we have issues keeping the youngsters interested when there's other fancier languanges and environments out there.

Github Language Stats (https://madnight.github.io/githut)

"But! But!...", you argue, "...you need Verilog to work with chip design". I won't argue that in practice this is true to some degree because there's a whole lot of verilog out there, but in theory Verilog doesn't say absolutely anything about how chips work. It's just a programming language, which original intended purpose was to describe chip behavior. Remember, Verilog wasn't ever meant to be used to implement chips, which is a fact that tends to get forgotten many times. As another example of this, look at Erlang, which was created to program telephone switches. This means neither that Erlang can't be used for other things, nor that Erlang is the only way to program telephone switches.

"Still.. ", I hear from the back of the room, "..can't they just learn SystemVerilog? It's like C++, sort of". That misses a large part of what makes a language successful. True, it's a C-like syntax to some extent (mixed up with Java and a hodge-podge of 90's language ideas), but you don't have access to your toolbox of C++ tools like linters, debugger, syntax highlighters, IDEs, sanitizers and everything else that makes you productive. While the chip designers might think SystemVerilog is the best option because it has the largest ecosystem in this domain, this ecosystem is a drop in the ocean compared to popular languages.

And I'm 100% certain that in many cases, although it's beneficial, you don't need to know a single thing about how chips work and you can still do a great job of verifying the functionality of some IP core. Let's turn things around for a while. I have spent the past ten years developing military radar, software defined radios, automotive radar, digital cameras, weather radar to name a few things. And I have absolutely no clue about how microwaves work and I'm a lousy photographer, but I still can do a good job because at the abstraction where I work I don't need to understand all these things. But if I also would be required to learn, let's say Fortran, because that's what was traditionally used for math heavy applications. well, then I would probably start looking for jobs elsewhere. And the same goes for verification engineers. Give them a spec, tell them what to do and a familiar programming languages and they'll probably do just fine. I definitely think it's preferable to have someone who does know how chips work on the team, but it doesn't have to be all of them.

So let's assume then that we can have verification engineers who don't have to know verilog. What does that mean? Well, it means that our pool of potential candidates has grown by a 1000 times. I can tell you for sure that it will be a lot easier to find good verification engineers than finding good vericication engineers who also happen to know Verilog. And if you have ever experienced how hard it is to get good verification engineers, then this is something you will greatly appreciate. And it's not just the number of developers that's growing. The whole flourishing ecosystem of libraries, forums and examples around popular languages like Python makes the Verilog ecosystem look like a wasteland and this means you can reuse much more existing code and involve your software friends in better ways.

So what's the solution then? We need to enable software developers to create and verify chips. In this article we will be looking at the verification side, but I suggest looking at companies like ChipFlow to see what's happening on the design side as well. And Python is a good bet right now. It might be Go or Rust or something completely different in a couple of years, but right now Python is widely used already as a glue language in chip development environments, like perl was used 10-20 years ago. And we see more and more EDA tools growing Python bindings. I'm not sure the latter is a purely positive thing, but we'll see.

And when it comes to Python and verification I have said many times by now that I believe cocotb will be one of the important technologies in the coming years. It is a mature technology that is already adopted by large and small companies and you can find ads for companies looking to hire for this skill. And just like we saw with UVM, being able to hire for a certain skill without having to train them for your home-built verification framework is a time and money saver. Another thing that speaks for cocotb is that it uses your regular RTL simulators. This means it poses no threat to the EDA vendors. It just enhances their offering and they can continue to sell licenses for their tools. And with cocotb being a project governed by FOSSi Foundation, we clearly see how much more interested the EDA vendors are in collaborating on cocotb compared to many other free and open source silicon projects. Of course, it also works with your favorite open source simulators like Icarus, ghdl or Verilator and this means the proprietary EDA vendors need to compete with the open source tooling on equal terms, where they need to flex the strengths of their tooling rather than the artificial lock-in created because none of the open source tools have any UVM support to speak of.

So, to sum things up. UVM has been a massive success and has seen industry-wide adoption over the past ten years. But the most important thing UVM did was probably to show the industry the benefit of having a common framework, not being the best framework in itself. My prediction is that UVM will see a slow (everything in EDA is slow) decline in the coming years and it will be relevant for long time, but gradually be replaced by frameworks written in more common languages and that it's a good bet to get to know cocotb specifically a little better. So I think it's time to consider whether UVM sparks joy. Otherwise, it's time to thank it and say good bye.

Now, if the industry could just agree on a common format for describing IP cores and interfacing EDA tools. Oh well. That's another battle for another day.

Friday, February 11, 2022

FOSSi Explosion 2021

Do you know what just happened? 2021 just happened. Most years has its ups and downs, but when it comes to 2021 it seems like the prevalent feeling was that everyone just wanted it to be over. And now it is over, except for all those damn retrospectives. So, with the risk of opening up some old wounds I would like to take a look at what happened last year in my corner of the free and open source silicon world.

In the 2020 retrospective I wrote about a couple of big milestones, like the first vendor-supplied FOSSi FPGA toolchain and the first fully FOSSi ASICs. 2021 was... more of the same I guess. And personally I think this is the interesting part. Everyone is now working hard to actually do stuff with these new opportunities that arose in 2020. Finding new possibilities, hitting limitations and working around them. Solving problems, being creative and coming up with new ideas. Less headline-friendly but will have more impact longer term. And 2021 was by no means void of interesting news. Just look at the FOSSi Foundation newsletter El Correo Libre that was packed to the brim with interesting projects and announcements each month.. It's more that I couldn't think of anything that particularly stood out so I'm moving directly to the more personal events instead. Is that ok? Of course it's ok. I'm writing now!

FuseSoC

Let's start by looking at my oldest active open source project, FuseSoC, that turned ten years old in 2021. For those who don't already know, FuseSoC is an award-winning package manager for HDL code. Package manager is a concept that is well-known for software developers, from system-level package managers like apt and rpm to language-specific ones like npm, pypi maven and cargo.

For chip designers, the idea of a ubiquitous package manager has not taken hold and most companies invent and maintain their own incompatible system. Kind of like how we did software up to the mid nineties. Although not as ubiquitous as I had hoped by now, FuseSoC has over its ten years life span still grown to be the most widely used package manger for Verilog/VHDL code and is used internally in large and small companies as well as powering many of the most popular open source silicon projects.

So how was 2021 for FuseSoC? Frankly, not all that exciting. There was the FuseSoC 1.12 release early 2021 that you can read about here. We got to see some new features, like fellow RISC-V Ambassador Carlos Eduardo de Paula showing how to use FuseSoC with Chisel-based designs, but most of the work done on FuseSoC over the year was to prepare for a big, exciting 2.0 release that will happen some time in 2022. Still, it makes sense to mention FuseSoC first because it is used by every single other project written about here, and the number of projects using FuseSoC steadily rises regardless of the activities on FuseSoC itself.

And just how to get started with FuseSoC for a new design is a question that pops up from time to time. So when Alibaba Group's T-Head Semi released their OpenC910 I did a spur-of-the-moment live coding session, adding FuseSoC support for the core, documenting it through a Twitter thread for everyone to see the process as it unfolded. Check it out if you want to see the unfiltered process of taking a previously unseen core and adding FuseSoC support for it, as well as showing some of the benefits in doing so.

Edalize

Edalize started out as a part of FuseSoC but was split out into its own project in 2018. That turned out to be the right decision because it is now used in several different projects other than FuseSoC. And in 2021, Edalize saw far more activity than FuseSoC. In case someone is wondering what Edalize is, it's an abstraction library for EDA tools. Basically, it provides a common API for different EDA tools such as simulators, formal tools, linters, synthesis tools and FPGA toolchains. So instead of writing Makefiles, TCL scripts and other configuration files for 30 different EDA tools manually, you just need to describe it once it the EDAM (EDA Metadata) format and Edalize will generate the correct setup files for your tool of choice. Very handy. The award-winning five minute Edalize introduction video provides more detail about this.

As mentioned, Edalize saw a lot of activity during 2021. First of all it gained support for three new FPGA toolchains; oxide for Lattice Nexus chips, libero for MicroSemi devices and apicula for Gowin FPGAs. The biggest news in terms of tool support was however the openlane backend which provided the first ASIC flow for Edalize. I wrote about that in A first look at Edalize for ASIC flows last year if you want to learn more.

From cheese to chips with the Flow API

A large chunk of the work done in 2021 was not immediately visible to users but was done to lay the foundation of the new Flow API, which will add a great deal of more features and flexibility to Edalize in the future. This has been in the works for quite some time and for those wanting to get a rough idea about it, I recommend taking a look at the article accompanying the Edalize 0.3.0 release which contains a brief introduction to the flow API.

SERV

Moving on to another of my more well-known projects, the award-winning SERV, the world's smallest RISC-V CPU, there's plenty of news to report. The question everyone seems to ask first is if it got any smaller, and yes, it did. I was able to optimize away around 20% of the remaining FFs during 2021, although the combinatorial parts of SERV remained more or less the same size.

But size isn't all that matters. The documentation was massively improved with most of the internal modules now having schematics which are accurate down to the gate-level. And to prove this is actually the fact, I redid the ALU in Digital (a logisim clone) from the schematics and used that as a drop-in replacement of the original ALU. If anyone has the time and interest it would be really cool to see all of SERV implemented in a Logisim-like program and even use that as an interactive documentation somehow.

Working, runnable implementation of the SERV ALU

In addition to schematics, I also added descriptions and timing diagrams for most of the important signal transitions. The ambition is to not only be the world's smallest RISC-V CPU, but also the most well documented. Still got some ways to go but it's already really good. As for new features, SERV got support for the M extension thanks to Zeeshan Rafique who added that as part of Google Summer of Code.

Another big milestone was that SERV was taped out at least four times during 2021 as part of the OpenMPW programme. Two of those tapeouts, both of them the SERV-based Subservient SoC, were done by my colleague Klas Nordmark and I (mostly Klas) as part of a grant by NLNet Foundation to add the Edalize OpenLANE backend and an accompanying reference project. There will hopefully be more things written about this particular project in 2022, especially when we receive the actual chips.

I also found some more time in 2021 to talk about SERV and presented at four conferences, with a brand new SERV video premiering at the embedded RISC-V Forum and being subsequently updated with some additional project ideas for the following events. This new video has shown to be quite popular and goes into more detail on more things that happened during 2021. It's still not as popular as the SERV talk from WOSH 2019 though which apparently has been seen almost 30000 times(!?!?!?)

CoreScore

CoreScore is one of the more niche uses of SERV... ok, most uses of SERV are pretty niche come to think of it. Anyway, CoreScore is a project that tries to answer the question How big is my FPGA? by simply seeing how many SERV cores we can fit into the FPGA
on different development boards. Pretty straight-forward but also very useful for comparing both FPGAs, and also the efficiency of different toolchains. Going into 2021 the record was 5087 cores in a single FPGA. That number was topped twice in 2021 with 6000 cores being the new world record thanks to Sylvain Lefevbre and his Xilinx VCU128 board. Apparently this made some numbers in the tech media and among other things ended up on Tom's Hardware. That was particularly fun as I have fond memories of my 15 year old self spending hours and hours on Tom's Hardware trying to find which motherboard was best for overclocking my Celeron 300A Mendocino. But it was not just in the top where things happened in CoreScore land. In total there were 19 new scores submitted by different users, almost doubling the number of known CoreScores. And best of all, there's now a beautiful highscore table at corescore.store to keep track of all the numbers.

If you're missing your favorite board in the list, don't hesitate to find out the CoreScore and submit a number for it. Love to see more!

LED to Believe

Another less known but somewhat similar project to CoreScore is project LED to Believe. The goal is simple. If you have an FPGA board, LED to Believe will be able to generate and FPGA image that blinks a LED on your board. While being a very simple project it does serve two purposes. The first is to act as a pipe cleaner for your toolchain. FPGA toolchains are complex and there's a suprising amount of things that can go wrong. Having the most simple project possible helps verifying that the tools are properly installed and can generate an image before you move on to other projects. The second purpose is to be an entrypoint into using FuseSoC and demonstrate how well-suited FuseSoC is for porting a design to different hardware targets. And I would like to claim that it has been very successful in this regard. Already when the year started we could blink LEDs on 44 different FPGA boards and as the year ended this number had risen to 77 thanks to all fantastic contributions from users all over the world. And again, see a board that's missing? Roll up your sleeves and send me a pull request.

SweRVolf

The final big open source project I took into 2021 is SweRVolf, a reference platform for the Western Digital SweRV family of RISC-V cores. More recently, SweRVolf is also the foundation of the RVFPGA Computer Architecture Course from Imagination University Programme. RVFPGA is rapidly gaining popularity and I'm both excited and a bit scared now that thousands of university students will get their first contact with computer architecture, RISC-V, Zephyr, open source silicon and FuseSoC through a SoC I designed. And with that in mind, there has been some work to make SweRVolf even more robust and accessible.

The year started with landing support for simulating using Vivado XSim in addition to the already supported Verilator and QuestaSim. Software support was improved as well thanks to a port of SweRVolf for the Tock OS. Increased availability could also be seen on the hardware side where it is now possible to use the smaller SweRV EL2 CPU as an alternative its larger sibling SweRV EH1. The EL2 support was also a prerequisite to run SweRVolf on the Digilent Basys3 board, which carries a smaller FPGA than the Nexys A7 and thus can only fit the EL2 CPU. Many of the latest features can be read about in the SweRVolf 0.7.4 announcement.

ViDBo

The last piece of functionality added to SweRVolf is technically a separate project, but it was born out of a need in RVFPGA and is where it was first used as well. As RVFPGA will become available as an online course, there were some concerns about hardware costs. The online education platform used wasn't totally happy about requiring students to buy an FPGA board. This could be solved by running the course entirely using an RTL simulator but it's really not the same thing as interacting with a board, running your own code and see how it reacts to moving switches and watching LEDs light up from your memory writes. There have been plenty of efforts to visualize simulations by providing some kind of GUI, either in terminal or through some graphics. All of those however seems to be one-off efforts and not easily portable. I wasn't really keen on either repurposing an existing solution nor writing a new single-use system. But then I got an idea. Instead of a tight coupling between simulator and GUI I decided to define a protocol that communicates I/O state over websockets. Websockets are readily available in almost any programming language and most importantly can be used directly in browsers without any complications.

ViDBo : AKA, SoC over sockets

This allows for adding a small component into the simulation model that sends simulation model outputs and receives inputs over websockets. On the other side of the websockets connection sits a browser with an interactive picture of the board, a Virtual Development Board, or ViDBo. This gets us as close as we reasonably can to a no-cost FPGA board experience without any simulator- or OS-specific building blocks. And while this first implementation uses an RTL simulator as the backend and a web browser as its frontend, there is nothing that stops us from having a pure software model as the backend or a headless CI system that acts as a frontend, injecting I/O state and observing outputs. VidBo only defines the protocol sent over websockets, not what sits on either side of the protocol. The only drawback of VidBo is that I now have yet another open source project to maintain which probably was the last thing I needed. Oh well, so far it hasn't been all that bad and I have had some very welcome contributions.

Other stuff

Wow! This turned out far longer than I had anticipated. Sorry about that. But if you got this far and for some reason still haven't had enough of SERV and FuseSoC or want to learn a bit more about the history of open source silicon, I also did a series of video interviews during 2021, covering different topics. First out was two episodes on open source silicon and RISC-V for the FOSS North pod followed by Matt Venn interviewing me for the YosysHQ. Highly recommend checking out both those channels even if you don't want to hear more of me. They both have many high quality interviews with a wide range of topics and guests.

I think this covers most of my open source silicon activities over the past year. Wait! One more thing. I made a UART that's small enough to fit in a tweet in case, you know, someone needs a UART that's small enough to fit in a tweet.

Finally, I would also like to mention and extend my thanks to Qamcom and NLNet Foundation for funding work on Edalize and Subservient as well as Imagination Technologies and Western Digital who have been funding most of the SweRVolf and ViDBo work during 2021. Take note, all you freeloading companies out there. This is what real support of open source looks like and how you help build a healthy ecosystem.

And with those well chosen words we can leave 2021 without any regrets and sail into the bright future of 2022. Bon voyage!

Friday, January 7, 2022

Edalize 0.3.0

Looks like it's time for a new Edalize release. During this development cycle, most of the work has been done under the hood with creating a new internal architecture and refactoring many of the backends. Most of those efforts will bear fruit longer term, but we can already today see the initial work on the flow API, that has been planned for at least two years. We also welcome a new backend for Lattice Nexus devices and some miscellaneous feature additions and bug fixes. Read on for the full story on what makes Edalize 0.3.0 the best Edalize (and likely the best EDA tool interfacing framework) ever!

Flow API

Edalize today has almost 30 backends. That's a lot of backends! Each of these backends map to a primary tool. The icarus backend runs Icarus Verilog. The quartus backend runs Quartus, the Vivado backend runs Vivado and so on. Ok, that's not strictly true. The Vivado backend can also optionally use Yosys for Synthesis. But still, Vivado is the primary tool here. But then we have the Icestorm, Trellis, Apicula backends that first run Yosys, then NextPNR (or in the case of Icestorm optionally Arachne-PnR) and then finally runs some target-specific bitstream generation tool which is what has provided the name for the backend. Even though much of the heavy lifting is done by yosys and nextpnr, it's still reasonable to name it after the distinguishing part.

Current tool-centric Edalize backend with the configure, build and run stages

But what if we want to do a timing simulation of a routed design for a Xilinx device using QuestaSim? In that case we want to run most of the Vivado toolchain before switching over to the simulator. We see both that the naming scheme is starting to fall apart and that the current architecture isn't capabale of doing this. And apart from the use cases that do work really well, we can also find plenty that don't. For example, look at the VUnit and (the currently proposed) Cocotb backends. Both these projects could be far better integrated with Edalize if the backends weren't considered a monolithic thing. The solution to this is the new flow API which allows arbitrary tools to hook up in a flow graph, using EDAM structures to pass information between them.

Example of what a timing simulation flow could look like in the new flow API. The flow graph is first set up according to the backend configuration. EDAM structures then carry information between all nodes

Separating the execution of the individual EDA tools and the execution of the flow graph into two distinct problems also allows future improvements such as using cloud orchestration tools or workload managers to direct the tool execution rather than a local Makefile, which is the case today. The new flow API also comes with a new abstraction layer for executing EDA tools also allow us to more consistently add custom launcher for our tools. This has already been used to great effect for seamlessly running a combination of dockerized and local tools.

There are still a lot of things that need to be properly documented, features to add and many of the existing backends still need to be ported over to the new flow API, but the good news is that an initial version of the flow API is shipping with Edalize 0.3.0, so you can try it out right away. And it has already brought some new features that weren't available before like using Surelog or sv2v as frontends to bring SystemVerilog support to tool flows that don't natively support that. Here's a quick example for how to build a blinky for the icestorm flow with the new API

from edalize.flows.icestorm import Icestorm

edam = {}

print("Adding files")
files = [{'name' : 'blinky.v', 'file_type' : 'verilogSource'}]

print("Setting parameters")
parameters = {
    'clk_freq_hz' : {'datatype'  : 'int',
                     'default'   : 1000000,
                     'paramtype' : 'vlogparam'}}

print("Setting flow options")
flow_options = {
    'nextpnr_options' : ['--lp8k', '--package', 'cm81', '--freq', '32']}

print("Creating EDAM structure")
edam = {
    'name'         : 'blinky',
    'files'        : files,
    'toplevel'     : 'blinky',
    'parameters'   : parameters,
    'flow_options' : flow_options,
}
print("\nInstantiating the Icestorm class with current dir as build directory")
icestorm = Icestorm(edam, '.')

print("\nconfigure writes the Icestorm configuration files but doesn't run any of the tools")
icestorm.configure()

print("Now we run the EDA tools")
#This needs the actual EDA tools and the blinky verilog file (https://github.com/fusesoc/blinky/blob/master/blinky.v)
icestorm.build()

Or just inspect which options are available for a specific flow

from edalize.flows.icestorm import Icestorm

print("\nAvailable options for the icestorm flow")
print("=======================================\n")
for k,v in Icestorm.get_flow_options().items():
    print(f"{k} : {v['desc']}")

A slightly more involved example can also be found here

As for the next step, FuseSoC will be updated to take advantage of the new flow API

prjoxide backend

A new family of Lattice devices called Nexus have been documented under the name project Oxide. This has resulted in a new yosys->nextpnr->bitstream generation flow, which now has a corresponding Edalize backend.

Docker launcher script

The Edalize backends are rapidly gaining support for using a custom launcher by setting the EDALIZE_LAUNCHER environment variable before running Edalize. This has been used to great effect already and serves many use-cases and will be the topic of a separate blog post some time in the future. But for now we will look at the major change for Edalize. As of this version, Edalize now ships with an extra script called el_docker, el meaning Edalize Launcher. Since running containerized versions of the open source EDA tools has been a frequent use case of the Edalize launcher mechanism, but required an external launcher script, I decided to ship this along with Edalize for now. So to use this, just set the environment variable EDALIZE_LAUNCHER=el_docker before Edalize gets called. Below is an example of first linting and then creating a GDSII file for SERV without a single local EDA tool. Pretty sweet, ain't it?

#Add SERV to the current workspace
$ fusesoc library add serv https://github.com/olofk/serv

#Set the launcher command
export EDALIZE_LAUNCHER=el_docker

#Run linting with Verilator
fusesoc run --target=lint serv

#Make GDSII file of SERV with OpenLANE and sky130 PDK
fusesoc run --target=sky130 serv

Other things

Verilator typically generates an executable simulation that is subsequently run, but there are use cases where the model instead should be integrated in a larger system. For this reason, the verilator backend now has an exe option which can be set to false to stop before the final linking

This development cycle has also seen a lot of improvements on the CI side, reformatting the source code for consistency with a tool called black, support for newer Libero versions and as usual, a bunch of bug fixes

I hope you all enjoy this new version of Edalize. As always, there's plenty of things going on and we would love some help, so if you want to get involved you are most welcome to join the chat at gitter.im/librecores/edalize or look through the code, issues or PRs at https://github.com/olofk/edalize