Tales from Beyond the Register Map: RISC-V

Showing posts with label RISC-V. Show all posts

Friday, July 5, 2024

SERV 1.3

SERV the 1.3th. CPUs will never be the same again

"A new SERV version! How much smaller than the last one?"

Hate to disappoint you all, but we have now reached the point where the award-winning SERV, the world's smallest RISC-V CPU won't get much smaller. I still sometimes get ideas for how to make it smaller and then spend a week implementing something just to discover that nine out of ten times it actually made it bigger instead. But it's ok. Size matters, but the important thing is what you do with it. To quote one of the great 20th century thinkers:

"A CPU is only as good as its software and ecosystem"

And this release brings plenty of software and ecosystem improvements.

Systems

Seymour Cray once said "Anyone can build a fast CPU, the trick is to build a fast system". I would like to think the same goes for small CPUs and systems. With SERV being very small, care must be taken to avoid the rest of the system eating a lot of gates. One such system-level improvement was made by slightly rescheduling RF accesses which makes it possible to use one combined single-port SRAM for RF, code and data. Looking at some SRAM macros I have at hand the single-port versions are around 45% smaller area than the dual-port ones. Since the area for most SERV systems is typically dominated by memory size, this means a vastly lower total system cost from just a small change inside SERV.

The aforementioned memory improvement might be missed by people implementing SERV since the required changes are outside of the SERV core itself. Having now done a bunch of systems and subsystems around SERV, I have come to notice some typical optimizations, a few recurring patterns and things that get implemented over and over again. To make everyone's life a little easier I created Servile, a convenience building block containing SERV and some surrounding logic that is used by most systems built around SERV.

Once that was in place, I restructured the Servant, Serving and Subservient components around Servile and could both remove a lot of custom logic and get some features for free like debug printouts and optional extensions. With a growing number of systems, the documentation has been overhauled to collect them all in the Reservoir to make it easy to pick and choose the right foundation for building your own SERV-based system.

The reservoir has a SERV-based subsystem for your every need

As usual, a new release sees support for a range of new FPGA boards. This time we have Arty S7-50, PolarFire Splash Kit, Machdyne Kolibri, GMM-7550, Alchistry AU, ECP5 Evaluation board and Terasic DE1 SoC - all provided by kind contributors. Big up mi bredren!

Simulation

I have claimed many times that SERV is so simple that it is possible to simulate it at almost the same speed as we can run it on a chip. And now it is much easier to check if this is really the case. A new parameter in the verilator testbench called cps (cycles per second) keeps track of how many simulated cycles we can run each real second. This information is printed to a file to be viewed. If you are running the Servant SoC through FuseSoC, you can enable it by passing --cps as an extra argument.

SERV runs at 2.8MHz in simulation on my 7-year old laptop. Love to see how fast someone can make it go on a beefier machine

Another convenient simulation feature is the addition of a PC tracing parameter which just dumps the PC to a binary file after each instruction. In order to make use of this data I have put together some custom tooling for profiling and tracing. For example, I can feed the trace data and the ELF file that was executed during the simulation into a Python script and get a list of how much time was spent in each C function. This in turn can be used to either optimize the software or potentially add some accelerator to SERV.

SERV running a Zephyr demo application. Apparently it spends almost 25% of its time printing to the UART

While this custom tooling is handy, I would ideally not have to create any of it myself. It would be much better to export the PC trace to some already existing format which already has reasonable tools for profiling and tracing. I'm sure something like this must exist, but I haven't been able to find anything. Tips are most welcome!

Software

As mentioned in the introduction, we need software to actually have any use of our CPU. When it comes to software on SERV, users tend to run bare-metal or use Zephyr. For users of the latter, I can happily report that the Servant Zephyr BSP is upgraded to support Zephyr 3.5 which is a big step up from the previous version 2.4.

Bugs

Despite years of verification, a bug was spotted in SERV recently. The trap signal, indicating an exception, was released too early, before the MSBs of the PC had been written. In practice meant that jumps in and out of the exception routines could end up in the wrong place if the code was place near the end of the 32-bit address space. Hopefully this hasn't affected any users, since it's unlikely that a software running on SERV would fill an address space large to trigger this. Still, it was a bug and it has been fixed now.

Looking forward

SERV now has two larger siblings called QERV and HERV, as has been hinted before, which implements 4- and 8-bit wide datapaths instead of SERV's 1-bit. They currently reside in a different repository, but the idea is to ultimately integrate them fully in the SERV codebase. As a first step, each module inside SERV will receive a parameter to set the width of the datapath. Most of the work is done, but there are some minor things remaining.

You might wonder why there is no 2-bit wide version, also known as DSRV, for Double SERV. The reason for that is that we started out with QERV and discovered that the added overhead was so small, so it didn't seem to make any sense doing a version that would probably be roughly the same size as QERV but half as fast.

If you can afford a few more gates you can get more bang for the bucks

Another thing that I hope to write more about in the future is the version of SERV that was submitted to the 7th shuttle run of the Tiny Tapeout project. Even though I know about some really interesting implementations of SERV, I'm typically not allowed to say that much about them, and as with all open source projects there are likely many more uses that I'm completely unaware of. This one however, I can show to everyone.

Area-wise, it was a bit tight but using two of the allocated slots, it was possible to fit SERV, an XIP SPI Flash controller and 5 GPRs. The project was dubbed Underserved, which I thought was fitting as Tiny Tapeout caters to the underserved hobbyist market.

There are already plans for more SERV features and I have some really, really interesting news I'm bursting to share, but that will unfortunately have to wait a little bit longer. Stay tuned!

Saturday, June 15, 2024

FOSSi Freakout 2023

So, this was supposed to be one of those new year retrospectives. It's just that I didn't really find time to write this until now. Still closer to last new year than the next one, so I think it's ok. As usual, this is a round-up of all free and open source silicon, or FOSSi, things I have been involved in over the past year.

FOSSi Foundation

Beatiful evening in Santa Barbara. What is not seen in the picture is that everyone spent the rest of the evening trying to get rid of tar from their feet

This was the year when we resumed our on-site conferences after doing our virtual FOSSi Dial-Up series for a few years. We did Latch-Up in Santa Barbara and ORconf in Munich, and both events were great successes. Hope to see you all at our future events. Fellow FOSSi Foundation director Philipp Wagner also did a well-received open source chip design talk at the FOSDEM main track.

SERV

The award-winning SERV, the world's smallest RISC-V CPU turned five years old, which I wrote about in a retrospective called Five years of SERVing for its fifth birthday. It was also the year when SERV got its big sister, QERV, which provides a 3x speed-up for a marginal extra cost in area. Most of the work was done by a colleague at Qamcom and we did a press release called Qamcom boosts RISC-V which has more details. QERV currently lives in a separate repository, but the ultimate goal is to integrate it into SERV with a switch to select width.

There's also an ever bigger version called HERV on its way. A lot more things has happened with SERV but I'm saving that for the next SERV release announcement. Some of the news were also revealed in the talks I did about SERV at FPL (not recorded), ORConf and the Göteborg RISC-V Meetup (not recorded).

FuseSoC

A tour through FuseSoC and Edalize

FuseSoC saw a lot of activity in 2023. We finally got version 2.0 out of the door and with that we could remove a lot of old code and focus on new features such as core file validation, supporting the use of FuseSoC as a library instead of a stand-alone application, cached generators, file tags, minimizing rebuilding and other things that you can read about in the FuseSoC documentation. I also managed to three FuseSoC presentations; at FPL (not recorded), ORConf and the CHIPS Alliance Technology Update.

VeeRWolf

During 2023 I found some time to work on VeeRwolf...wait, did you say VeeRwolf? I thought it was called SweRVolf. Yes, one of the bigger changes was a complete renaming from SweRVolf to VeeRwolf, since the SweRV cores were renamed VeeR. Anyway, apart from the renaming, the Zephyr BSP for VeeRwolf was upgraded to support Zephyr 3.5 instead of the old 2.7 thanks to one of my Qamcom colleagues. As regular readers probably know already, VeeRwolf is the base for the RVFPGA computer architecture programme, and over the year I had the pleasure of participating in two different RVFPGA workshops and meet the other RVFPGA team members.

Edalize

The big new things for Edalize the past year was the introduction of the Flow API. I have written about this specifically a few times before, but to sum it up, it's a complete revamp of how the Edalize backends work that enables new workflows, avoids code duplication, allows for external plugins, avoid unnecessary rebuilds and a lot more good things. As with all new feature introductions, some effort was spent hunting down newly introduced bugs but the code is now in a good shape. I'm using the flow API with Cocotb-enabled simulations as my daily driver now and it works great. It would still be great to get some help porting over all the old backends to the new API.

Speaking of backends, Edalize got several new ones in 2023, namely sandpiper-saas, openroad, design compiler, genus and efinity

Also the documentation saw a lot of improvements.

CoreScore

The CoreScore project did not see any new records. 10000 cores in an FPGA is still at the number one. We did howver see support for some new boards with Intel snatching three of the top five spots with their new Agilex devices and the introductions of new FPGA vendor Efinix and their Xyloni board.

LED to Believe

I tried to push for project LED to Believe to support 100 different FPGA boards by the end of the year. We got reeeeally close, but the big celebration came after the year had ended. Still there was a healthy number of newly supported boards over the year.

Other

In an attempt to collect all videos of my FOSSi projects, I made a video gallery. After that experience I decided not to add web design to my CV.

As I have started using Cocotb more and more, I thought it would be a good idea to also have a quick example of how to use FuseSoC and Cocotb together, so I made an example design called fusesocotb to serve as a reference design.

This very blog also celebrated since 2023 was the year that saw the 100000th visitor to the site, much thanks to a UVM vs Cocotb post that went viral in late 2022.

And that pretty much sums up my 2023 FOSSi activities. Well, not quite. I won an award also. At the RISC-V Summit I was awarded a Community Contributor award. I was really happy to receive that and to hear that the open source contributions I do are actually acknowledged and appreciated. So, big thanks for that. And thanks also to the RISC-V summit organizers for making it easy to find my seat.

Monday, October 23, 2023

Five years of SERVing

Making your own RISC-V CPU is a terrible idea. I have said that many times before. There are already a million RISC-V cores out there to choose from, so making another one makes no sense at all. It's kind of the same thing with UARTs. There are probably as many UART cores written as there are people using one. In my opinion, it's much more important to learn how to reuse and contribute to existing cores than making your own. I remember musing at ORConf one year that I was probably one of the only people in the room who hadn't built their own UART or RISC-V core. Although... I actually made a UART a couple of years ago. But only because I had to see if I could make a UART that fit inside a tweet. And you know what's worse? I actually made a RISC-V CPU as well. In fact, the world's smallest RISC-V CPU. But I never intended to. So how did it all happen? Well, on this day five years ago, I made the first commit to the CPU that would eventually be SERV. But to see how it really started we need to go back in time one more month to September 21 2018, just before teatime.

In Gdansk, Poland, we were organizing ORConf. As usual there were a lot of great presentations covering a wide array of topics. One of those was a lightning talk by Michael Gielda of Antmicro. RISC-V Foundation, as it was known by then before the formation of RISC-V International, together with Google, Antmicro and MicroSemi had decided to organize a contest. The idea was to let contestants build either the fastest or the smallest RISC-V CPU for a given FPGA board. It had to pass a suite of compliance tests and it had to correctly execute some example programs, including running the Zephyr real-time operating system. The deadline was November 26, meaning just over two months to put together a core together with at least a minimal set of peripheral controllers to create a small SoC, figure out how to run the compliance test suite and port Zephyr to the SoC.

I remember thinking that this sounded like a lot of fun, but quickly realized there was no way I would ever find the time to do such a thing, so I basically forgot about it for my own part.

About a month after that, I was at home doing the dishes when I started idly thinking about that processor contest. In particular, I started thinking, what if you do an operation on two 32-bit vectors, but only process one bit each cycle? Then you could theoretically reuse the same circuitry 32 times instead of having 32 separate copies of basically the same thing.

At this point I had no idea that was called a bit-serial algorithm and was an idea people had already come up with and that was quite widely used in CPUs in the 70s.

So after finishing up the dishes I had to try out this idea on a piece of paper. It obviously worked for boolean operations such as and, or, xor, but it turned out also to work for additions (and by extension subtractions, which can be described as additions). To convince myself that it actually did work, I made some simple Verilog and ran it through a simulator. And when that worked I got curious about what operations that a RISC-V CPU had to implement so I opened the base specification for the first time and started reading. I was so amazed by the simplicity and how well thought-out the whole thing seemed to be. Having read at least a few similar documents before, this really stood out as a piece of art. And also, it got me thinking that maybe, just maybe, it wouldn't be impossible to actually do a CPU. I could at least give it a try and see. Just for fun, you know.

So on October 23 2018 I pushed a first commit to the newly created SERV repo and suddenly I found myself making a CPU. By this time, half the contest time had already passed which was not an ideal situation, but I did it mostly for fun so it didn't matter too much. The evenings and weekends during the following month was spent doing Karnaugh maps and schematics on paper while putting the kids to bed and then turning it into verilog.

And finally, a few hours before the deadline on November 26, I got the Dining Philosophers Zephyr application running, which was one of the requirements, and that was that. Most of the people involved in the contest were exchanging tips, experiences and metrics on a forum. From the discussions there I knew that SERV had no chance of winning since it wasn't the smallest and definitely not fastest, but it had been a fun experience and I got a new appreciation of the RISC-V ISA, so I was happy regardless.

About a week later, I woke up to a lot of congratulations on social media. The first RISC-V Summit was ongoing on the other side of the planet and it turned out I had won an award for SERV after all. Not for being the smallest, and definitely not for being the fastest, but for being the most creative solution:)

SERV becomes The award-winning SERV

As the code had been written to meet a one-month deadline, the implementation was rushed, to say the least. But during the development I had gotten plenty of ideas for further improvements. So free from any time pressure, I continued working at my own pace. During the kids ballet lessons I drew schematics and instead of bedtime reading I made more Karnaugh maps and SERV steadily became smaller and smaller.

In March 2019 I submitted an proposal to speak at the RISC-V workshop during the Week of Open Source Hardware (WOSH) in Zürich that summer with the first ever presentation of SERV called Bit by bit: How to fit 8 RISC-V cores in a $38 FPGA board. The board in question was the TinyFPGA BX, in case someone is curious.

The presentation was accepted, but it turned out to be a terrible name for a talk. You see, before the end of March I had made some changes to fit 10 cores inside that FPGA. After arriving in Zürich I had that number further increased to 14, and on the day of the presentation I had managed to fit 16 cores into that FPGA, so I had to make some last minute adjustments to the slides. The talk went fine but not great. However, for some unknown reason it has managed to become the second most viewed video on the RISC-V YouTube channel. I guess this is the closest I'll ever be to become a YouTuber.

Note to self - never put performance numbers in a presentation title

Another important milestone that happened just before heading to Zürich was that SERV got its logo.

Designed using WaveDrom, the amazing timing diagram (and logo designer) tool. Here's the source for the logo https://wavedrom.com/editor.html?%7Bsignal%3A%5B%7Bwave%3A%270.P...%27%7D%2C%7Bwave%3A%27023450%27%2Cdata%3A%27S%20E%20R%20V%27%7D%5D%7D

In September the same year I learned about another contest, the European FPGA Developer Contest, by Arrow Electronics. In a fit of hubris, I entered that contest too, thinking I had such a great idea that there was a serious chance of winning. The idea was a programmable heterogeneous sensor fusion platform in which you had an FPGA that was connected to a bunch of sensors. Data was collected from each of the sensors and combined into a message stream. The novelty here was to dedicate a SERV core to each sensor. That meant that each CPU could handle the parts of sensor communication that was easiest to do in software, like configuring and handling spi/i2c transactions, while using FPGA fabric to do custom protocols and heavy processing. I called the project Observer and released it on Github

I felt very clever about the naming scheme of Observer. The sensor data came in through a "collector" and then the "base" controlled the data flow. Together those formed a "junction". All data streams then merged and was then sent off-chip through a "common emitter".

I still think it's not a half-bad idea, but at the same time there isn't a super clear use-case for it and it would require a lot more work than I was willing to put in to get it really usable. Speaking of work, I wanted to have a nice logo for the project, so I asked my very talented fiancée to draw one for me. She didn't have time so instead I made the crappiest logo I could come up with in ten minutes, proudly showed it to her saying that it had been released on the internet. The plan worked perfectly. Upon seeing the logo, she felt so embarrassed for me that I soon had a stunningly beautiful logo replacing the old one. The moral of the story: If you do something bad enough, someone will eventually be unable to resist improving it.

I have done many ugly things in my life, but the original Observer logo still hurts to look at

In the end I didn't win anything, although I did get to keep the very nice cyc1000 FPGA board. I believe some AI something something won, but I'm not sure actually. The other projects were never released anywhere to my knowledge and the whole thing felt slightly shady. And after that, Observer sank down on my list of priorities and has been laying dormant, but it did have one everlasting legacy that became its own, much more well-known project.

While working on Observer, I used the on-board sensors to have some data sources to combine. There was a button, an accelerometer and...well, that was it. It felt a bit silly to have a sensor fusion system with only two sensors though. The proper thing to do here would had been to solder on a couple of more sensors, but I chose a less work-intensive way, and instead added some SERV cores that just wrote random strings without ever being connected to any sensors. (Remember, kids! Always cheat if you have the option). After adding a couple of more SERV cores generating fake sensor data, I realized that there was still a lot of room left in the FPGA. So I added a couple of more cores and there was still room left. By now I had completely abandoned all pretentions of creating a useful system and just wanted to see how many SERV cores I could stuff into the FPGA. Then, of course, I got curious to see how many cores I could stuff into other FPGAs that I had at hand. Somewhere around this point I realized that this had now become its own project, with the sole intention of finding a metric for how large an FPGA is, by measuring the number of SERV cores they could fit.

For the first three minutes of existence, the project was called ServMark until I settled in the much catchier CoreScore. I then went to my fiancée again, got an amazing logo and started checking the scores for other boards at my disposal.A couple of days later I registered corescore.store, got some JavaScript help to create a dynamic high score list and started inviting other people to submit their CoreScores. At the time of writing, there are now more than 40 different FPGA boards on the list, with the top entry hitting 10000 cores.

CoreScore with its beautiful logo and the top five entries at the time of writing. Check out the site for the latest results.

I believe 10000 cores is the highest number of RISC-V cores anyone has put in a chip, at least at the time of writing. Not that they are doing anything useful, but still, I sincerely believe it has a useful purpose. First of all, it gives a rough metric of the relative size between different FPGAs, which normally tends to be an apples-to-oranges comparison thanks to different LUT sizes and available memory blocks. It's not perfect, but provides some guidance, especially to those new to FPGAs. It's also a way to compare the efficiency of FPGA toolchains. With open source alternatives popping up for more devices, it can be interesting to see how well they compare to their proprietary alternatives. For my own part, the prospect of getting higher scores has been a good motivation for further optimizations of SERV, so it has been helpful in that regard too. The rest of 2019 I was working on and off continuing to bring down the size of the core, and by the end of the year there was no doubt that it had surpassed all other RISC-V cores in terms of size, making it The award-winning SERV, the world's smallest RISC-V CPU.

As we rolled into 2020, there was something in the air that had a big impact on most people's lives, and gathering in large groups at conferences was just out of option. What happened instead was that people took their slideshows into video calls instead at online conferences. I went to one of those in early 2020 and I absolutely hated it. I could see no reason for watching decapitated heads - most of them struggling to handle the lack of audience feedback - reciting slides decks aloud, when I could just as well just read the slides myself or watch a recording where I could skip the boring parts or listen again if something wasn't clear.

Then, in the spring, I was asked if I could present at one of these virtual conferences. With all the latest advances of SERV I really wanted to present that, so I accepted, but also started thinking if there was a better way to use the new medium than the horrible slide decks I had been subjected to. I decided early to prerecord something. It didn't make sense for the audience to hear me mumbling and losing the thread, when I could just as well edit it to something more coherent beforehand and save everyone's time. I also realized that since everyone would watch it on a computer screen, and not on a projected screen in a badly lit room, I could afford having much finer details in the pictures. And most of all, I could have animations. And sound! So, stylistically inspired by a 1981 video on sorting algorithms we had been forced to watch in school and educational animations from my childhood, I equpped myself with a Korg Monotron for sound effects and a Python animation library to work on creating a fully immersive multimedia edutainment experience about SERV.

Creating a video like this also meant that it could not only be used for that particular online event, but could also be viewed anytime by people who wanted to get an introduction to SERV.

On April 29, the resulting video was aired during the 1st Virtual Munich RISC-V Meetup, about 24 minutes in. And for those who prefer just to watch the SERV video without the rest of the talks, it is also available here.

Doing a video was a lot of fun, but there is another common way to make people aware of your work - writing papers. It so happened that my friends at ETH Zürich had just released a paper called Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing. Now, 4096 cores with area-efficient floating point support is of course impressive, but I thought I could do better in at least one metric, the sheer number of cores. So over 2 days in May 2020 (~15 hours of struggling with Latex and maybe an hour actually writing the thing) I put together the first paper on SERV, Plenticore: A 4097-core RISC-V SoClet Architecture for Ultra-inefficient Floating-point Computing. For some reason it was never accepted into any conferences and has been cited approximately zero times, so I guess I will have to wait a bit longer for my academic career to take off.

At this point, SERV had been implemented in plenty of FPGA projects by myself and other people, but to my knowledge hadn't been taped out. But 2020 also brought the OpenMPW program with free tapeouts for open source projects. Wanting to both add support for the OpenLANE open source ASIC toolchain in Edalize and getting some real SERV chips, I applied for funding through NLNet Foundation to add Edalize support for OpenLANE and use that to make a small SERV-based SoC. Both objectives were successfully accomplished and thus the Subservient SoC was born, which is a minimal SoC, just like Servant, but more suitable for ASIC implementations.

Over the year SERV continued to see more code commits titled optimize this or simplify that, and in addition to working on the core itself, I added documentation and other people got it running on more FPGA boards

Having so much fun making the first SERV video, as well as one on Edalize (that actually won an award!), I decided to make another one for the 2021 virtual event RISC-V Forum : Embedded Technologies. In that case it was even better to have it prerecorded since it was broadcast around 4 am my time and I don't think I would have given a very good talk at that time of the day. This time I went more for day time TV aestethics and I think it turned out pretty nice.

This video was also the debut of the first major external contribution, with the (optional!) support for the M extension contributed as a Google Summer of Code project.

At this time, I was still trying out some further optimizations, but it started to get really, really hard to make further improvements to the size. I could spend a week working on an idea for an optimization only to find out it didn't improve anything at all, or even made SERV slightly larger. Generally, only 1 out 10 ideas for optimizations yielded any positive results. There are still some ideas left to try out but they require a bit more time and effort than I have been able to spend. I will mention a few of these later on in the future outlook.

The amount of effort trying to further minimize SERV is pretty much the inverse of the resource usage chart

So the last years have focused more on features. Early in 2022, I added support for ViDBo, the Virtual Development Board protocol that allows interacting with a simulation of the Servant SoC through an interactive picture of an FPGA board presented in a web browser. It's a pretty fun way to try out SERV if you don't have a real FPGA board. The spring of 2022 also brought (optional!) support for the C extension thanks to another student that I mentored through the LFX Mentorship Program.

But no matter how small or featureful SERV is, what matters in the end is where it is actually used. As with most of my (and others) open source projects, you typically have no idea about most of the places where it is implemented. People only tend to reach out when they need support (which I'm happy to offer for SERV and my other open source projects, and already do for several clients) but I have learned about some use-cases over the years, such as USB-to-serial converters, DDR2 initialization, GPS synchronization for RADAR and power management. Two of my favorite applications is a cranial implant ASIC and a research project that has taped it out on the PragmatIC FlexIC process, likely making SERV the first RISC-V CPU to be taped out on plastic film. In general, the small size, and hence the simplicity has made it a an attractive choice for taping out on novel processes where yield is not yet maximized.

And that pretty much sums up the first five years of SERV. What about the next five years? There are a number of half-finished ideas that I would like to finish up as well a couple of things that I haven't even started. Looking outside the core, I have more or less finished a new framework for CoreScore that will both allow more cores to fit in the same area as well as making routing easier and thereby faster for the P&R tools. As a point of reference, it apparently took around 48 hours to do P&R on the 10000 cores that occupies the number one spot on the list right now. And I would of course like to see both CoreScore and Servant running on more FPGA boards, and that's where you, my dear readers, can help out by making sure it runs on your board and submit a patch to get it added.

For the core itself, more ISA extensions would be nice. Most of them, like floating point support, doesn't make very much sense, but why not. It would be very cool to run Linux on SERV, and for that we probably need to at least implement the atomic instructions and some more pieces of the privilege spec. More interrupt sources and a debug interface has been requested as well, which makes sense. I also have an almost completed project that uses some nifty math to calculate the smallest decode unit, given some constraints. The plan is to do a proper write-up about that and perhaps even put out another paper.

But the thing that most people seem excited about is having 2-, 4- and maybe 8-bit versions of SERV. Those would be slightly larger, but also almost 2, 4 or 8 times faster than SERV is today. Preliminary results have shown that the size of wider versions grow less than one might think. There is actually a big reveal coming up in this department soon. If you read this a couple of weeks after I write this, you will already know what it is, but I'm keeping it a secret for now :)

So, all in all, I'm pretty damn proud of SERV and it has been a lot of fun working on it. It has also managed to attract commercial interest from some of the larger players in the RISC-V space while still remaining very much a passion project for which I can curl up on the couch with a cup of coffee on a Sunday afternoon and think about some optimizations, draw some schematics or Karnaugh maps just like normal people do crosswords.

Happy fifth Birthday, SERV. As a proud parent, I'm amazed to see how small you have become since you were a baby.

Thursday, April 27, 2023

FOSSi Fantasies 2022

2022 is behind us and as usual I'm wrapping up my open source silicon efforts of the past year in a blog post.

Hang on...a 2022 retrospective...? In... the end of April??

Yes, I am fully aware that a third of 2023 has already passed, thank you very much, and that it's way too late to write a new year's retrospective. I have just been extremely busy, but I still wanted to get it out of the system. It's only been a month since the Persian new year, so in the light of that I'm not that late. Anyway...

This year, the list is shorter than usual simply because I haven't done as much FOSSi work as previous years. ~~The upside is that I managed to complete this article close to new year, rather than much later as in previous years.~~

But why have I done less work? The main reason is that I have been terribly busy with my day job which is mostly of proprietary nature. While I do a lot of interesting stuff in this capacity, it's unfortunately not much I can talk about publicly. This is another reason why I greatly prefer open source work, so that I can show, share ideas and collaborate with other people.

There is one thing however from my day job that I can and want to talk about. This summer we launched a fully remote office that we call Qamcom Anywhere. Having worked remote myself for the past four years I have been pushing to make it possible for more people in the company to do the same, and this year we got it going for real. Qamcom Anywhere has been a massive success and we have found amazing new colleagues in various parts of Sweden where we previously haven't been looking before. Given my previous experience with working remote as well as working on open source projects, which by nature tend to be highly distributed, I was tasked to run this new office. As part of the campaign we also recorded a commercial, so I can now also add movie star to my CV ;)

As for now, we have launched Qamcom Anywhere in Sweden, but hope to spread to more countries in the future. Stay tuned if you want to be colleagues!

But even outside of this I have found some time to work on my long list of open source silicon projects.

Let's start by looking at what has happened to FuseSoC over the past year. Most of the effort has been spent on getting FuseSoC in shape for a long overdue 2.0 release. A couple of major features and changes were identified as being important to complete before this release. Most notably is the support for the new flow API in Edalize, but a number of critical bug fixes and backwards-incompatible changes were put in place. Unfortunately, we never manages to get the 2.0 release out of the door, but we got close and at least released a first release candidate in late December, while the final release saw the light in the beginning of this year.

Most of my other FOSSi projects made good progress. There were a few new Edalize releases, SweRVolf got some new board support, a maintenance release to an old i2c component that I maintain (hey! it's important to put some effort into cleaning up old code, not just rewrite new code all the time) and even good old ipyxact saw a new release, which now contains ipxact2v, a very handy tool to automatically convert IP-XACT designs to Verilog top-levels. It's not fully complete, but the functionality that exists is already coming to good use in various projects.

The project that probably saw the most interesting news in 2022 was SERV. SERV itself gained support for compressed instructions, thanks to Abdul Wadood who I had the great pleasure of mentor through Linux Foundation's LFX Mentorship Program. And aside from improvements to the core itself, in 2022, fellow FOSSi superstar developer Florent Kermarrec, who might be most known for Litex, managed to run 10000 SERV cores in a Xilinx FPGA. This seems to be the current world record right now for the most RISC-V cores in a single device, but I'm very curious how the competitors will react (looking at you, Intel!).

This year was the first in three years where I didn't create any new videos (or Fully Immersive Multimedia Edutainment Experiences, as I prefer to call them). Instead I did a number of presentations using old-fashioned slides for a live audience. The first of them being the RISC-V Week in Paris in early May where I did both a presentation on FuseSoC as well as one on SERV. The SERV video was unfortunately never published, but the slides for the presentation called How much score could a CoreScore score if a CoreScore could score cores? can be found here. I did another FuseSoC talk at FPGA World in Stockholm, Sweden which was also not recorded, but the final one, from the RISC-V Summit in San Jose was. This one was title SERV: 32-bit is the new 8-bit and aims to look at how RISC-V can be competitive in the traditional 8-bit market thanks to SERV.

Both at the RISC-V week as well as the RISC-V Summit I also got the chance to meet with the people behind RVFPGA, a project I have been involved with almost since the start. For those unaware, RVFPGA is a free computer architecture course by Imagination University Programme that runs on a slightly modified version of SweRVolf that I built a couple of years ago. Right after the RISC-V Summit I also got the chance to watch a RVFPGA workshop in action, and it was super fun to see all these students working their way through the labs.

Let's see.... what else then...hmm... you know what? I'm sure other things in 2022 as well, but my memory is fading and May is just outside the door waiting to come in, so let's just cut it here before the rest of the year passes too. Here's to 2023. Happy new year!

Friday, February 11, 2022

FOSSi Explosion 2021

Do you know what just happened? 2021 just happened. Most years has its ups and downs, but when it comes to 2021 it seems like the prevalent feeling was that everyone just wanted it to be over. And now it is over, except for all those damn retrospectives. So, with the risk of opening up some old wounds I would like to take a look at what happened last year in my corner of the free and open source silicon world.

In the 2020 retrospective I wrote about a couple of big milestones, like the first vendor-supplied FOSSi FPGA toolchain and the first fully FOSSi ASICs. 2021 was... more of the same I guess. And personally I think this is the interesting part. Everyone is now working hard to actually do stuff with these new opportunities that arose in 2020. Finding new possibilities, hitting limitations and working around them. Solving problems, being creative and coming up with new ideas. Less headline-friendly but will have more impact longer term. And 2021 was by no means void of interesting news. Just look at the FOSSi Foundation newsletter El Correo Libre that was packed to the brim with interesting projects and announcements each month.. It's more that I couldn't think of anything that particularly stood out so I'm moving directly to the more personal events instead. Is that ok? Of course it's ok. I'm writing now!

FuseSoC

Let's start by looking at my oldest active open source project, FuseSoC, that turned ten years old in 2021. For those who don't already know, FuseSoC is an award-winning package manager for HDL code. Package manager is a concept that is well-known for software developers, from system-level package managers like apt and rpm to language-specific ones like npm, pypi maven and cargo.

For chip designers, the idea of a ubiquitous package manager has not taken hold and most companies invent and maintain their own incompatible system. Kind of like how we did software up to the mid nineties. Although not as ubiquitous as I had hoped by now, FuseSoC has over its ten years life span still grown to be the most widely used package manger for Verilog/VHDL code and is used internally in large and small companies as well as powering many of the most popular open source silicon projects.

So how was 2021 for FuseSoC? Frankly, not all that exciting. There was the FuseSoC 1.12 release early 2021 that you can read about here. We got to see some new features, like fellow RISC-V Ambassador Carlos Eduardo de Paula showing how to use FuseSoC with Chisel-based designs, but most of the work done on FuseSoC over the year was to prepare for a big, exciting 2.0 release that will happen some time in 2022. Still, it makes sense to mention FuseSoC first because it is used by every single other project written about here, and the number of projects using FuseSoC steadily rises regardless of the activities on FuseSoC itself.

And just how to get started with FuseSoC for a new design is a question that pops up from time to time. So when Alibaba Group's T-Head Semi released their OpenC910 I did a spur-of-the-moment live coding session, adding FuseSoC support for the core, documenting it through a Twitter thread for everyone to see the process as it unfolded. Check it out if you want to see the unfiltered process of taking a previously unseen core and adding FuseSoC support for it, as well as showing some of the benefits in doing so.

Edalize

Edalize started out as a part of FuseSoC but was split out into its own project in 2018. That turned out to be the right decision because it is now used in several different projects other than FuseSoC. And in 2021, Edalize saw far more activity than FuseSoC. In case someone is wondering what Edalize is, it's an abstraction library for EDA tools. Basically, it provides a common API for different EDA tools such as simulators, formal tools, linters, synthesis tools and FPGA toolchains. So instead of writing Makefiles, TCL scripts and other configuration files for 30 different EDA tools manually, you just need to describe it once it the EDAM (EDA Metadata) format and Edalize will generate the correct setup files for your tool of choice. Very handy. The award-winning five minute Edalize introduction video provides more detail about this.

As mentioned, Edalize saw a lot of activity during 2021. First of all it gained support for three new FPGA toolchains; oxide for Lattice Nexus chips, libero for MicroSemi devices and apicula for Gowin FPGAs. The biggest news in terms of tool support was however the openlane backend which provided the first ASIC flow for Edalize. I wrote about that in A first look at Edalize for ASIC flows last year if you want to learn more.

From cheese to chips with the Flow API

A large chunk of the work done in 2021 was not immediately visible to users but was done to lay the foundation of the new Flow API, which will add a great deal of more features and flexibility to Edalize in the future. This has been in the works for quite some time and for those wanting to get a rough idea about it, I recommend taking a look at the article accompanying the Edalize 0.3.0 release which contains a brief introduction to the flow API.

SERV

Moving on to another of my more well-known projects, the award-winning SERV, the world's smallest RISC-V CPU, there's plenty of news to report. The question everyone seems to ask first is if it got any smaller, and yes, it did. I was able to optimize away around 20% of the remaining FFs during 2021, although the combinatorial parts of SERV remained more or less the same size.

But size isn't all that matters. The documentation was massively improved with most of the internal modules now having schematics which are accurate down to the gate-level. And to prove this is actually the fact, I redid the ALU in Digital (a logisim clone) from the schematics and used that as a drop-in replacement of the original ALU. If anyone has the time and interest it would be really cool to see all of SERV implemented in a Logisim-like program and even use that as an interactive documentation somehow.

Working, runnable implementation of the SERV ALU

In addition to schematics, I also added descriptions and timing diagrams for most of the important signal transitions. The ambition is to not only be the world's smallest RISC-V CPU, but also the most well documented. Still got some ways to go but it's already really good. As for new features, SERV got support for the M extension thanks to Zeeshan Rafique who added that as part of Google Summer of Code.

Another big milestone was that SERV was taped out at least four times during 2021 as part of the OpenMPW programme. Two of those tapeouts, both of them the SERV-based Subservient SoC, were done by my colleague Klas Nordmark and I (mostly Klas) as part of a grant by NLNet Foundation to add the Edalize OpenLANE backend and an accompanying reference project. There will hopefully be more things written about this particular project in 2022, especially when we receive the actual chips.

I also found some more time in 2021 to talk about SERV and presented at four conferences, with a brand new SERV video premiering at the embedded RISC-V Forum and being subsequently updated with some additional project ideas for the following events. This new video has shown to be quite popular and goes into more detail on more things that happened during 2021. It's still not as popular as the SERV talk from WOSH 2019 though which apparently has been seen almost 30000 times(!?!?!?)

CoreScore

CoreScore is one of the more niche uses of SERV... ok, most uses of SERV are pretty niche come to think of it. Anyway, CoreScore is a project that tries to answer the question How big is my FPGA? by simply seeing how many SERV cores we can fit into the FPGA
on different development boards. Pretty straight-forward but also very useful for comparing both FPGAs, and also the efficiency of different toolchains. Going into 2021 the record was 5087 cores in a single FPGA. That number was topped twice in 2021 with 6000 cores being the new world record thanks to Sylvain Lefevbre and his Xilinx VCU128 board. Apparently this made some numbers in the tech media and among other things ended up on Tom's Hardware. That was particularly fun as I have fond memories of my 15 year old self spending hours and hours on Tom's Hardware trying to find which motherboard was best for overclocking my Celeron 300A Mendocino. But it was not just in the top where things happened in CoreScore land. In total there were 19 new scores submitted by different users, almost doubling the number of known CoreScores. And best of all, there's now a beautiful highscore table at corescore.store to keep track of all the numbers.

If you're missing your favorite board in the list, don't hesitate to find out the CoreScore and submit a number for it. Love to see more!

LED to Believe

Another less known but somewhat similar project to CoreScore is project LED to Believe. The goal is simple. If you have an FPGA board, LED to Believe will be able to generate and FPGA image that blinks a LED on your board. While being a very simple project it does serve two purposes. The first is to act as a pipe cleaner for your toolchain. FPGA toolchains are complex and there's a suprising amount of things that can go wrong. Having the most simple project possible helps verifying that the tools are properly installed and can generate an image before you move on to other projects. The second purpose is to be an entrypoint into using FuseSoC and demonstrate how well-suited FuseSoC is for porting a design to different hardware targets. And I would like to claim that it has been very successful in this regard. Already when the year started we could blink LEDs on 44 different FPGA boards and as the year ended this number had risen to 77 thanks to all fantastic contributions from users all over the world. And again, see a board that's missing? Roll up your sleeves and send me a pull request.

SweRVolf

The final big open source project I took into 2021 is SweRVolf, a reference platform for the Western Digital SweRV family of RISC-V cores. More recently, SweRVolf is also the foundation of the RVFPGA Computer Architecture Course from Imagination University Programme. RVFPGA is rapidly gaining popularity and I'm both excited and a bit scared now that thousands of university students will get their first contact with computer architecture, RISC-V, Zephyr, open source silicon and FuseSoC through a SoC I designed. And with that in mind, there has been some work to make SweRVolf even more robust and accessible.

The year started with landing support for simulating using Vivado XSim in addition to the already supported Verilator and QuestaSim. Software support was improved as well thanks to a port of SweRVolf for the Tock OS. Increased availability could also be seen on the hardware side where it is now possible to use the smaller SweRV EL2 CPU as an alternative its larger sibling SweRV EH1. The EL2 support was also a prerequisite to run SweRVolf on the Digilent Basys3 board, which carries a smaller FPGA than the Nexys A7 and thus can only fit the EL2 CPU. Many of the latest features can be read about in the SweRVolf 0.7.4 announcement.

ViDBo

The last piece of functionality added to SweRVolf is technically a separate project, but it was born out of a need in RVFPGA and is where it was first used as well. As RVFPGA will become available as an online course, there were some concerns about hardware costs. The online education platform used wasn't totally happy about requiring students to buy an FPGA board. This could be solved by running the course entirely using an RTL simulator but it's really not the same thing as interacting with a board, running your own code and see how it reacts to moving switches and watching LEDs light up from your memory writes. There have been plenty of efforts to visualize simulations by providing some kind of GUI, either in terminal or through some graphics. All of those however seems to be one-off efforts and not easily portable. I wasn't really keen on either repurposing an existing solution nor writing a new single-use system. But then I got an idea. Instead of a tight coupling between simulator and GUI I decided to define a protocol that communicates I/O state over websockets. Websockets are readily available in almost any programming language and most importantly can be used directly in browsers without any complications.

ViDBo : AKA, SoC over sockets

This allows for adding a small component into the simulation model that sends simulation model outputs and receives inputs over websockets. On the other side of the websockets connection sits a browser with an interactive picture of the board, a Virtual Development Board, or ViDBo. This gets us as close as we reasonably can to a no-cost FPGA board experience without any simulator- or OS-specific building blocks. And while this first implementation uses an RTL simulator as the backend and a web browser as its frontend, there is nothing that stops us from having a pure software model as the backend or a headless CI system that acts as a frontend, injecting I/O state and observing outputs. VidBo only defines the protocol sent over websockets, not what sits on either side of the protocol. The only drawback of VidBo is that I now have yet another open source project to maintain which probably was the last thing I needed. Oh well, so far it hasn't been all that bad and I have had some very welcome contributions.

Other stuff

Wow! This turned out far longer than I had anticipated. Sorry about that. But if you got this far and for some reason still haven't had enough of SERV and FuseSoC or want to learn a bit more about the history of open source silicon, I also did a series of video interviews during 2021, covering different topics. First out was two episodes on open source silicon and RISC-V for the FOSS North pod followed by Matt Venn interviewing me for the YosysHQ. Highly recommend checking out both those channels even if you don't want to hear more of me. They both have many high quality interviews with a wide range of topics and guests.

I think this covers most of my open source silicon activities over the past year. Wait! One more thing. I made a UART that's small enough to fit in a tweet in case, you know, someone needs a UART that's small enough to fit in a tweet.

Finally, I would also like to mention and extend my thanks to Qamcom and NLNet Foundation for funding work on Edalize and Subservient as well as Imagination Technologies and Western Digital who have been funding most of the SweRVolf and ViDBo work during 2021. Take note, all you freeloading companies out there. This is what real support of open source looks like and how you help build a healthy ecosystem.

And with those well chosen words we can leave 2021 without any regrets and sail into the bright future of 2022. Bon voyage!

Monday, September 13, 2021

...and out come SweRVolf

One of the main FOSSi projects I've been running the last couple of years is SweRVolf, a FuseSoC-based reference platform for Western Digital's family of RISC-V cores collectively called SweRV. I have written about SweRVolf before (e.g. here and here) so I won't go too much into details other than noting that it was created to provide an easy way to get started with the SweRV cores for both software and hardware developers and be simple enough to grasp for aspiring engineers while still providing enough flexibility to be expanded for advanced use-cases. And it seems to have hit these goals well enough, because Imagination chose to use SweRVolf as a base for their RISC-V-based computer architecture course called RVFPGA. This means that tens of thousands of students will soon get their first contact with RISC-V, open source silicon and computer architecture through SweRVolf. And this in turn means that SweRVolf needs to be rock solid and provide the necessary features to be used in this context. Fortunately, both Imagination and Western Digital are putting their money where their mouths are and have sponsored the development of several features that have been deemed important for using SweRVolf in an educational context.

Most of these extra features as well as some other work have now been implemented which means it's time to do another release of SweRVolf, this time called 0.7.4. If you're impatient you can find pre-built FPGA bitstreams here and try it out right away, but for those of you with a little more time, let's dig into the changes since the last version.

Xilinx XSim support

In addition to official support for ModelSim/QuestaSim and Verilator, SweRVolf can now also be simulated with Vivado XSim (versions 2020.1 and later). While the former are still recommended for performance reasons, having XSim as an option is useful, especially if Vivado is already used for building an FPGA version of SwerVolf. Please note that these are only the officially supported simulators, but there has been successful reports of using other simulators such as xcelium or Riviera Pro

SweRV EL2 support

When SweRV was announced in 2018 the family consisted of SweRV EH1, with other versions coming later. And in early 2020 the EL2 and EH2 variants were released, providing smaller and higher performance options. Of these two versions, adding support for EL2 has been the most requested feature, not least from RVFPGA to optionally allow using lower-cost hardware by selecting a simpler CPU. And in this version the support for EL2 has been finalized. Most of the work has been to provide infrastructure and abstractions both in software and hardware to easily switch between EH1 and EL2 in simulation and FPGA builds. One notable difference is that EH1 can run at 50MHz on the supported FPGA targets while EL2 only runs at 25. By adding hardware support for reporting such differences and making the software aware, most of them can be made transparent for users.

For targets that support both EH1 and EL2, EH1 will remain the default but EL2 can be used by providing the option --flag=cpu_el2 on the FuseSoC command-line, such as fusesoc run --target=nexys_a7 --flag=cpu_el2 swervolf

Demo application

The Zephyr-based demo application is intended both as a quick self-test to show that things are working as intended as well as providing an example of how to use some of the functionality found in SweRVolf. This application has been updated to showcase some of the new features of the latest SweRVolf release such as printing out the CPU type and the runtime-detected clock frequency of the system

Zephyr support

Zephyr RTOS is the recommended realtime operating system for SwerVolf and the SweRVolf-specific support functionality (BSP) is occasionally updated to catch up with the latest Zephyr version. This version will not support a newer Zephyr release but instead stays at 2.4. There is however added functionality to handle runtime-detection of system clock frequency, made necessary by the introduction of EL2 mentioned above. This functionality allows having the same UART baud rate and timer frequency across different system clock frequencies

GPIO remapping

The 32 upper GPIO have been moved on the memory map from 0x80001014 to 0x80001018 to allow space for a GPIO direction word adjacent to each 32-bit GPIO bank. This is one of the rare cases of changing the hardware to simplify software. More specifically, creating a 64-bit GPIO driver in Zephyr turned out to be far more complicated than expected. And since the upper 32 GPIO have not been used in any known SweRVolf implementation it was considered safe to move them, reserving space for a pin direction word to be implemented later.

Basys3 support

SweRVolf was originally created for the Digilent Nexys A7 board as that was identified as a popular board for a reasonable price with enough I/O and a large enough FPGA for SweRVolf. But it is by no means the cheapest option so SweRVolf has now also been ported to the Digilent Basys3 board which is about half the price. This is also a board that is very popular, not least in universities which hopefully means that a lot more people can get access to SweRVolf this way. And SweRVolf was created to be portable so hopefully the new board support can serve as a template to add support for even more boards in the future.The smaller size of the FPGA on the Basys3 board however only allows for SweRV EL2, not SweRV EH1

Documentation

Apart from adding a chapter describing the Basys3 support, an error in the boot switch arrangements was found and corrected for the Nexys A7 target. A link to a port of the Tock OS has also been added together with a link the the project page on LibreCores. And if you are using SweRVolf in any way and would like the world to know, please let me know and I'll add it to the documentation.

As for the future there is at least one interesting feature in the works that I hope to share soon. Until then, have fun with SweRVolf 0.7.4. Sounds pretty fun, doesn't it?

Tuesday, February 23, 2021

FOSSi Fever 2020

2020 was a year with a lot of bad news and so it feels slightly strange to cheerfully write about a very specific topic in the light of this. But there will always be good and bad things happening in the world. So let's keep fighting the bad things and for now take look at what happened last year within the amazing world of open source silicon. I will start by mentioning the most significant, but by no means the only, milestones for the FOSSi movement as a whole and then take a more personal look at the work where I have been directly involved.

OpenMPW

The biggest story within free and open source silicon this year has undoubtly been the openMPW project involving Google, SkyWater Foundries and eFabless together with a number of other collaborators.

Ever since I got involved in open source silicon ten years ago, building a fully open source ASIC has been one of those big milestones. While we have had FOSSi IP cores taped out on chips for at least 20 years and parts of the flow being managed by open source tools, it has always seemed to be too much work and requiring filling in too many gaps to have a fully open source end-to-end flow to produce ASICs. But, over the years people all over the world have filled in the gaps and done the work bit by bit. Sometimes in the context of overarching programmes to advance open source silicon, sometimes in academic settings, sometimes coming from the industry and sometimes as completely unpaid hobby projects. And this year all these efforts came together, helped by funding, to produce four shuttle runs, each loaded with 40 different completely open source designs. The first of these runs are currently being fabricated, and it will be extremely interesting to see them coming back.

One of the final pieces in this puzzle was the PDK. And while SkyWater should be rightfully lauded for their decicion to open up their 130nm PDK, it begs to ask the question: Why on Earth did it take this long? What could possibly the fabs have to lose by doing this? What they gain is easy to answer, a completely new market of users who can create chips at their fab. According to people within the project, it's estimated that 75% of the people on the first shuttle define themselves as software engineers. It's very likely that none of these people would ever dream of making an ASIC without this possibility. So I kind of feel that making the EDA industry open up their formats is a bit like trying to get your kids to eat vegetables. It's a lot of groaning and complaining, but was it really all that bad in the end to get some nutritions. Or in the case of ASIC fabs, was it really all that horrible to release your PDK to get some more customers? Let's just hope this opens up the eyes of more fabs. My, and many others, dream is to eventually see the same thing happening to ASIC fabs as has been happening with cheap PCB services over the past ten years. And using that analogy, I'm quite sure it pays off to be early in the race. So, get started folks!

QuickLogic and SymbiFlow

The other big thing happening this year is that we finally have an FPGA-vendor shipping an open source toolchain for their devices. The company that will go down in the annals of history for being the first to do this is QuickLogic and their EOS S3 FPGA. This is by no means the first FPGA with an open toolchain, and the QuickLogic-flavored version of SymbiFlow developed by FOSSi veterans Antmicro is based on all this prior work. But it is the first time we see a toolchain being created from the FPGA manufacturer's specifications rather than being figured out from compiled FPGA binaries and it's the first toolchain that is supported and funded by the vendor rather than being at best tolerated by them. And again I must ask, why did it take so long for this to happen? If I was running a small FPGA startup with limited resources, I can't for my life understand why I would want to spend a lot of time and money to build and continouosly maintain a big unwieldy toolchain all by myself instead of adding the required device-specific bits to a known good open source toolchain and share the maintenance burden. If nothing else it would free up resources to build other value-add products on top of the tools. It's like if every vendor of computer systems would first build their own operating system and compiler before shipping their products. This is what we had in the 80s and abandonded it for very good reasons. Because it made absolutely no one happy. And you know what? I think the users of FPGAs should put more effort into pushing their vendors to support open source toolchains, because it will save everyone a heap of time and money.

Let me illustrate that last point with an example that actually happened when I was porting SERV to the QuickLogic devices. After synthesis I noticed that it used far more resources than expected. Looking at the synthesis logs I realized the memories in the design weren't mapped to on-chip SRAM. So I asked the toolchain developers about this. They pointed me to the file in the toolchain that contained the rules for mapping to SRAM. I quickly found a badly tuned parameter, changed it to a more sensible value and ten minutes later it was working fine. An hour later I had submitted a patch back to the toolchain that fixed the problem for everyone else who would encounter it.

let's break this down into numbers. Finding the cause of the bug took about 15 minutes. Fixing it, another five. At that point I could use it myself, but after spending another 15 minutes or so, it was also fixed for everyone else.

Now let's do the same exercise for a proprietary closed source toolchain. Finding the cause of the bug takes... well...it depends... Let me explain.

I started my professional career at a company which at that point was the world's largest FPGA buyer. Whenever we had problems, they flew in two FAEs to sit in our lap, they could provide us with custom internal builds of their tools and generally tried to make sure the problem quickly went away so that we would continue to buy FPGAs from them. However, most companies are not the world's largest FPGA buyer and does not get this treatment. Instead you will have to wade through layers of support people until you reach someone who is actually qualified enough to acknowledge the issue. I have been in this situation numerous times and would estimate this process usually takes around 2-3 months. Actually fixing the bug probably takes five minutes or so in this case too, but here comes the fun part. In most cases you will now have to wait, I don't know, a year or so until the fixed bug ends up in a released product that you can download. What happens in practice is that the user tends to find a workaround instead. In the example above, the likely solution would be to instantiate a RAM macro instead of relying on inference. This however doesn't come for free as it requires finding all the instances where this is a problem, add special handling for this case which results in a larger code base with more options to verify and maintain. This costs time and this costs money. So the moral of this story is that closed source tools are more expensive for everyone involved and users of FPGAs should get better at telling the FPGA vendors that they are done with this closed source nonsense.

The QuickFeather. First FPGA board to ship with a FOSSi toolchain

There are numerous other news and projects that are well worth mentioning, but the above two are milestones that we have been waiting for a long time, so they deserved special attention. And if you want to keep up with the latest happenings in open source silicon, I highly recommend subscribing to the El Correo Libre newsletter which does a fantastic job of providing an overview of what goes on in all corners of the world. So let's move on to some of my more personal victories that aren't necessarily mentioned in other people's year in review.

When I am working on and talking about open source silicon I am often wearing many hats because I'm associated with several different organizations. Luckily they are all pretty much aligned on this topic which makes things far easier. But these organizations also have different motives and goals so I would like to mention a few words about them here.

Qamcom

My dayjob is working for Qamcom Research & Technology and also there it has been more FOSSi work than usual, which I think is a good indication that open source silicon is becoming increasingly more common in chip design in general. The year started off by finishing up some work on SweRVolf together with a couple of my Qamcom colleagues. SweRVolf is a project under CHIPS Alliance, an organization that Qamcom has been part of since 2019 to help improve the state for open source and custom silicon. After that I was pulled into a project for doing climate research with a huge radar system. My task here was to handle sub-nanosecond time synchronization between systems located hundreds of miles from each other, using the White Rabbit system developed at CERN. I was pretty excited about getting to know White Rabbit. The timing section at CERN responsible for the White Rabbit project and associated technologies are household names within open source silicon and have a long history here. I know many of the people working there personally and have great respect for their work. But so far I haven't had the chance until now to actually get down and dirty with the technology. Once my job was done there I moved on to another proprietary project that I can't discuss here. I do get to use FuseSoC and Verilator though, so it's fine :)

FOSSi Foundation

The hat I tend to wear most when topic revolves around open source silicon is my FOSSi Foundation director hat. And despite doing a lot less of the things we normally spend time on, it turns out we did a lot of other great things instead. I will however not go into more detail here, but instead point to the excellent summary done by my FOSSi Foundation colleague Philipp Wagner.

RISC-V

Arguably the most well-known project nowadays with ties to open source silicon is RISC-V. My RISC-V ties deepened in the beginning of the year when I was asked to become a RISC-V ambassador. Part of being an ambassador is to create awareness of RISC-V in the fields where I'm active. For well-known reasons the number and nature of events were a bit different from previous years which meant fewer opportunities to weild this new-found power, but I did participate in an ask-the-expert session during the RISC-V global forum and a couple of other events that will be described later. And I also got to be interviewed about RISC-V and open source silicon by Sweden's largest electronics and tech news outlets as well as the Architecnologia blog.

Current crop of RISC-V Ambassadors; AKA the Twantasic 12

In addition to my day job and participating in different organizations I also run a bunch of open source projects, so let's take a look at the progress of the most important ones during 2020.

SweRVolf

SweRVolf is an extendable and portable reference SoC platform for the Western Digital / CHIPS Alliance SweRV cores. SweRVolf is designed for software engineers who wants a turn-key system to evaluate SweRV performance and features, for system designers who want a base platform to build upon and for learners of SoC design, computer architecture, embedded systems or open source silicon methodology. To easily achieve the goals of portability and extendability it is powered by FuseSoC, which just happens to be one of my other open source silicon projects mentioned later on.

During 2020 SweRVolf gained support for booting from SPI Flash but most effort was spent on usability, by making it rock-solid, improve documentation, increase compatibility with more EDA tools, keep the underlying cores up-to-date and follow along with changes in the Zephyr operating system, which is the officially supported software platform for SweRVolf.

But the biggest thing to happen SweRVolf this year is that SweRVolf will be used as the base of a new university course from Imagination University Programme called RVfpga: Understanding Computer Architecture. I'm very excited (and slightly scared) about soon having thousands of students getting familiar with computer architecture, RISC-V and open source silicon through a SoC that I have designed. And I would like to mention a few things about how SwerRVolf is built because I think it's a great example of how to create chip designs. When I say that I have designed SweRVolf, most of the work has consisted of putting together various pieces and make sure it works well as a whole. Most of the underlying code has been written by other people, and from my perspective, that is really the most successful aspect of SweRVolf because it highlights the rich open source silicon ecosystem. The main CPU core is from Western Digital and governed by CHIPS Alliance. Most of the AXI infrastructure was developed through the PULP project at ETH Zürich and University of Bologna. The UART and SPI controllers were developed for the OpenRISC project during the first wave of open source silicon almost 20 years ago. The Wishbone infrastructure was developed by me when I started out with open source silicon ten years ago and the memory controller was created by Enjoy Digital and is written in Migen as part of the Litex ecosystem. And to go full circle, the memory controller uses a tiny RISC-V CPU called SERV internally to aid with calibration. SERV, the world's smallest RISC-V CPU is written by me. Small world. And of course the whole project is packaged with FuseSoC and uses Verilator by default for simulations, so it's FOSSi all the way. As I hope you understand by now it's not about some lone hero churning out code, but instead all this has been made possible by a huge amount of work by a ton of people over many years, and I'm proud to be one of them.

SERV

Probably the hobby (read unpaid) project I spent most time on during 2020 was SERV, the world's smallest RISC-V CPU, which turned from small to even smaller during the year. SERV is very much a project driven by numbers, so lets look at some of these numbers.

In February I got hold of an ZCU106 development board with a huge Xilinx Ultrascale+ FPGA for a project I was assigned to. As this was the largest FPGA I had ever had in my home I got curious to see how many SERV cores I could squeeze into it. The year before, at the 2019 RISC-V workshop in Zürich, I had done a presentation on how to fit 8 RISC-V cores in a small Lattice iCE40 FPGA (spoiler: it ended up being slightly more than 8 eventually), giving each of them a single I/O to communicate with the outside world. Problem this time was that after stuffing in 360 cores I run out of I/O pins. It would also had been practically impossible to verify that all these external pins actually did what they were supposed to do, so I needed some way of using less than one I/O pin per core. Then it struck me that just a few months earlier I had created a heterogenous sensor aggregation platform based on SERV cores called Observer. The idea with Observer was to connect a lot of sensors to an FPGA, each serviced by its own SERV core and then merge the data to an output stream. I gave up on the platform when I realized that while I could fit a lot of SERV cores into the devices, I just had a few sensors so there wasn't much data to to aggregate. But this platform was a very good starting point.

Block diagram of the Observer platform

By removing all sensor interfaces and just have each core print out an identification message instead I had a system that I could instantiate with any number of cores. Trying this on the ZCU106 I could now run over 600 cores on the FPGA. The next problem then was that I was running out of on-chip RAM way before any of the other FPGA resources. In case you don't know, most FPGAs contain a number of fixed-size SRAM spread out over the devices, each typically being 1-8kB large. For SERV, each core used one for the RF (register file), and another one for the program/data memory. With RISC-V using 32 32-bit registers, this means that only 128 bytes of the RF RAM is used but since the fixed-size SRAM on FPGAs are typically being far larger than that most of the RAM ends up unused. That's bad, but I had a plan. With a bit of work I managed to share RF, program and data in the same RAM so that the RF is allocated to the top 128 bytes of the RAM. This freed up half of the on-chip RAM blocks and I could eventually hit 1000 SERV cores on the ZCU106 board. Of course, at this point I was curious to see what the situation was for other boards after all these optimizations. Taking it one step further I figured I should turn this into a real thing by creating a benchmark so that people can have a quick way to see roughly how large the FPGAs are on different boards. And with that ServMark was born.

ServMark lasted for about three minutes until I realized CoreScore was a much catchier name, so that's we have now.

I had originally planned to do a presentation about SERV at Latch-Up in Boston. But for well-known reasons we cancelled all FOSSi Foundation physical events. Instead I accepted an invitation to speak at the First virtual RISC-V Munich meetup. By this point I had attended a couple of virtual meetups and I hated it. It was in most cases awful to watch a narrated slide deck without a stage and a speaker to bring it to life at least a bit. So, I decided to take a fresh look and look at the possibilities instead of being limited by the medium. I decided to make videos instead. First of all, since people would watch on a computer screen with proper resolution instead of a washed-out image projected on canvas. This meant I could have much more detailed pictures and smaller font sizes. I could also freely mix pictures and animations, fine-tune timings, do several takes of the audio and add sound effects. And despite being done by someone who has pretty much zero experience in these sort of things, I think it turned out pretty well. So on the day of the event, I just introduced myself and let the audience indulge in my fully immersive multimedia edutainment experience about SERV. Not sure why the Oscars committee hasn't got in touch yet.

Is CoreScore the only attempt to put a mind-boggling number of RISC-V cores inside chips? Absolutely not, and during the summer I learned of the Manticore project from Florian Zaruba and Fabian Schuiki (jointly known as Flobian Schuba) from ETH Zürich, both well-known names in the FOSSisphere from their work in the PULP ecosystem. Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing had been accepted into the prestigious Hot chips conference. Manticore is an impressive project, but I still thought it was a bit unfair that I wasn't invited as well. So I reached out for the biggest FPGA board I could find and then wrote my first academic paper of the year called Plenticore: A 4097-core RISC-V SoClet Architecture for Ultra-inefficient Floating-point Computing. Unfortunately I did not recieve an invitation to Hot chips despite this. I assume it must have gotten lost in the mail somewhere.

Oh well, let's look at some more numbers instead.

A number of optimizations was found over the year which, depending on the measure, further shrank the core 5-10%.
The number of supported FPGA boards for the servant SoC grew from 4 to 17, mostly thanks to other contributors (thanks everyone, love you all!)
The SERV support for the Zephyr operating system was rewritten and upgraded from the aging Zephyr 1.13 to 2.4, the latest version at the time of writing.

SERV resource usage over time on a Lattice iCE40 FPGA

Coming back to CoreScore, the results right now range from 10 cores on a smaller Lattice iCE40 device up to 5087 cores on a large Xilinx device and the high score table can even be viewed interactively online! If you can't find your favorite board there, just send a PR and we'll get it added. And if any people with access to crazy large FPGAs happen to read this, please get in touch with me. I'm very curious to see who will be first to get above 10000 cores.

SERV also saw another great improvement in the form of documentation. While not completely there yet, the functionality of most SERV modules have been documented together with detailed schematics showing the implementation down the individual gates, muxes and flip-flops. And most changes to the source are now accompanied by a code comment to clarify what is going on. Since SERV is optimized for size rather than readability, many parts of the core are difficult to figure out by just looking at the source code. Hopefully this will make it easier for other who want to understand or work on the core, but frankly it has also been very useful for me since I tend to forget why I did things in a certain way so I have had to spend a lot of time following my own tracks.

Schematic of the SERV control unit from the SERV documentation

FuseSoC

The oldest of my open source silicon project still going strong is FuseSoC. It is now about to turn ten years old and keeps growing in features and users for each year. Looking back at the changes through 2020 I can see some new trends in the development. The most important one is that for the first year ever, most of the work was not done by me. During 2020, my fellow FOSSi Foundation director and LowRISC employee Philipp Wagner has been pulling the heaviest load of FuseSoC development. And with Philipp came quality. Dr. Wagner has improved FuseSoC in pretty much every aspect. Bugs have been fixed and features has been added. The development experience has been improved by CI testing, automatic code formatting checks and improved testing coverage. And what makes me happiest is that the user experience has been improved not least by a total rewrite of the documentation into something that is actually useful and can be proudly shown to the world. All this is very much needed as FuseSoC is becoming increasingly popular. It has already been picked up by many of the flagship open source silicon projects like OpenTitan, SweRVolf, OpenPiton and with the RVFPGA university programme there will soon be a whole new generation who will get familiar with it as well.

Edalize

In 2018, the part of FuseSoC that interacted with the EDA tools was spun off into Edalize. The reason was that it was believed this part could be useful for others who weren't interested in the whole FuseSoC package. This prediction seems to have been correct and Edalize has very much started a life on its own by now. In addition to FuseSoC, Edalize is now used by several other projects such as Silice, Clash and fpga-perf-tool and over the year Edalize has gained support for 7 new EDA tool flows, bringing the total number up to 25.

2020 was also the year when Edalize had it's first taste of being in the spotlight on its own merits. For the Workshop on Open-Source EDA Technology (WOSET) 2020 I decided to submit a presentation about Edalize. Being an academic conference this also prompted me to write an accompanying paper as is the common courtesy for these kind of events. The paper received a lukewarm response but was accepted anyway. Once again I did not feel like reciting slides to a camera so I turned back to my new-found interest in advanced multimedia productions. And it paid off. The Edalize video won an award for best video at WOSET 2020. Well done Edalize!

LED to Believe

All of the above projects use FuseSoC and Edalize because - well it's kind of why I created FuseSoC in the first place - to easily reuse components and retarget to different devices. But I also realized there was a need for a dead simple project to help people getting started with FuseSoC - the Hello world of silicon, so to speak. And the Hello World of silicon is of course the blinking LED. So in 2018 I created project LED to Believe with the ambitious goal to create FuseSoC-powered LED blinkers for every FPGA board ever made. The project has several aspects that are useful in different ways. It serves as a very simple introduction to FuseSoC and how to make a design that targets multiple hardware. It is also an excellent pipe cleaner for when you receive a new board. If you can run the project successfully and get the LED to blink, it likely means you have managed to install all the EDA tools correctly which is no small feat, and you also have a template to take on bigger projects. And it's also fun to see what boards are available out there. While I have submitted a bunch of the board ports myself, the vast majority have come from all the fantastic contributors out there. And during 2020 the number of supported boards grew from 16 to 44. Perhaps not all the FPGA boards ever made, but a considerable chunk of them. And already in the short amount of 2021 that has passed, there have been numerous more contributions so we're getting closer all the time.

In closing, 2020 was a busy year FOSSi-wise. And this has just touched upon the surface of all things that have been happening during the year. And just as we were about to close the books on 2020, I was informed that Lattice had incorporated one of my FOSSi projects into their shiny new award-winning Propel design suite. Which project, you might ask? Was it the similarly award-winning FuseSoC, to give Lattice users immediate access to a rich ecosystem of Open IP cores? Or was it the Rosetta stone of Edalize, with its award-winning video, that would easily provide a coherent interface for a dozen simulators and make it easy to switch between Lattice's multitude of FPGA tools such as Diamond, icecube2 and Radiant? Or was it SERV itself, the award-winning CPU capable of offering a RISC-V experience for all but their absolutely smallest offerings? Well, actually, none of the above. It turns out that Propel now contains ipyxact, my somewhat feature-limited Python library for working with IP-XACT files. Not my first choice, but fair enough. I wonder if they have read my somewhat complicated relationship with IP-XACT.

Finally my work is recognized by big EDA vendors (picture by Gatecat)