Friday, July 5, 2024

SERV 1.3


SERV the 1.3th. CPUs will never be the same again

"A new SERV version! How much smaller than the last one?"

Hate to disappoint you all, but we have now reached the point where the award-winning SERV, the world's smallest RISC-V CPU won't get much smaller. I still sometimes get ideas for how to make it smaller and then spend a week implementing something just to discover that nine out of ten times it actually made it bigger instead. But it's ok. Size matters, but the important thing is what you do with it. To quote one of the great 20th century thinkers:

"A CPU is only as good as its software and ecosystem"

 And this release brings plenty of software and ecosystem improvements. 


Seymour Cray once said "Anyone can build a fast CPU, the trick is to build a fast system". I would like to think the same goes for small CPUs and systems. With SERV being very small, care must be taken to avoid the rest of the system eating a lot of gates. One such system-level improvement was made by slightly rescheduling RF accesses which makes it possible to use one combined single-port SRAM for RF, code and data. Looking at some SRAM macros I have at hand the single-port versions are around 45% smaller area than the dual-port ones. Since the area for most SERV systems is typically dominated by memory size, this means a vastly lower total system cost from just a small change inside SERV.

The aforementioned memory improvement might be missed by people implementing SERV since the required changes are outside of the SERV core itself. Having now done a bunch of systems and subsystems around SERV, I have come to notice some typical optimizations, a few recurring patterns and things that get implemented over and over again. To make everyone's life a little easier I created Servile, a convenience building block containing SERV and some surrounding logic that is used by most systems built around SERV.

Once that was in place, I restructured the Servant, Serving and Subservient components around Servile and could both remove a lot of custom logic and get some features for free like debug printouts and optional extensions. With a growing number of systems, the documentation has been overhauled to collect them all in the Reservoir to make it easy to pick and choose the right foundation for building your own SERV-based system.


The reservoir has a SERV-based subsystem for your every need

As usual, a new release sees support for a range of new FPGA boards. This time  we have Arty S7-50, PolarFire Splash Kit, Machdyne Kolibri, GMM-7550, Alchistry AU, ECP5 Evaluation board and Terasic DE1 SoC - all provided by kind contributors. Big up mi bredren!


I have claimed many times that SERV is so simple that it is possible to simulate it at almost the same speed as we can run it on a chip. And now it is much easier to check if this is really the case. A new parameter in the verilator testbench called cps (cycles per second) keeps track of how many simulated cycles we can run each real second. This information is printed to a file to be viewed. If you are running the Servant SoC through FuseSoC, you can enable it by passing --cps as an extra argument.

SERV runs at 2.8MHz in simulation on my 7-year old laptop. Love to see how fast someone can make it go on a beefier machine

Another convenient simulation feature is the addition of a PC tracing parameter which just dumps the PC to a binary file after each instruction. In order to make use of this data I have put together some custom tooling for profiling and tracing. For example, I can feed the trace data and the ELF file that was executed during the simulation into a Python script and get a list of how much time was spent in each C function. This in turn can be used to either optimize the software or potentially add some accelerator to SERV.

SERV running a Zephyr demo application. Apparently it spends almost 25% of its time printing to the UART

While this custom tooling is handy, I would ideally not have to create any of it myself. It would be much better to export the PC trace to some already existing format which already has reasonable tools for profiling and tracing. I'm sure something like this must exist, but I haven't been able to find anything. Tips are most welcome!


As mentioned in the introduction, we need software to actually have any use of our CPU. When it comes to software on SERV, users tend to run bare-metal or use Zephyr. For users of the latter, I can happily report that the Servant Zephyr BSP is upgraded to support Zephyr 3.5 which is a big step up from the previous version 2.4.


Despite years of verification, a bug was spotted in SERV recently. The trap signal, indicating an exception, was released too early, before the MSBs of the PC had been written. In practice meant that jumps in and out of the exception routines could end up in the wrong place if the code was place near the end of the 32-bit address space. Hopefully this hasn't affected any users, since it's unlikely that a software running on SERV would fill an address space large to trigger this. Still, it was a bug and it has been fixed now.

Looking forward

SERV now has two larger siblings called QERV and HERV, as has been hinted before,  which implements 4- and 8-bit wide datapaths instead of SERV's 1-bit. They currently reside in a different repository, but the idea is to ultimately integrate them fully in the SERV codebase. As a first step, each module inside SERV will receive a parameter to set the width of the datapath. Most of the work is done, but there are some minor things remaining.

You might wonder why there is no 2-bit wide version, also known as DSRV, for Double SERV. The reason for that is that we started out with QERV and discovered that the added overhead was so small, so it didn't seem to make any sense doing a version that would probably be roughly the same size as QERV but half as fast.

If you can afford a few more gates you can get more bang for the bucks

Another thing that I hope to write more about in the future is the version of SERV that was submitted to the 7th shuttle run of the Tiny Tapeout project. Even though I know about some really interesting implementations of SERV, I'm typically not allowed to say that much about them, and as with all open source projects there are likely many more uses that I'm completely unaware of. This one however, I can show to everyone.

Area-wise, it was a bit tight but using two of the allocated slots, it was possible to fit SERV, an XIP SPI Flash controller and 5 GPRs. The project was dubbed Underserved, which I thought was fitting as Tiny Tapeout caters to the underserved hobbyist market.

There are already plans for more SERV features and I have some really, really interesting news I'm bursting to share, but that will unfortunately have to wait a little bit longer. Stay tuned!

Saturday, June 15, 2024

FOSSi Freakout 2023

So, this was supposed to be one of those new year retrospectives. It's just that I didn't really find time to write this until now. Still closer to last new year than the next one, so I think it's ok. As usual, this is a round-up of all free and open source silicon, or FOSSi, things I have been involved in over the past year.

FOSSi Foundation

Beatiful evening in Santa Barbara. What is not seen in the picture is that everyone spent the rest of the evening trying to get rid of tar from their feet


This was the year when we resumed our on-site conferences after doing our virtual FOSSi Dial-Up series for a few years. We did Latch-Up in Santa Barbara and ORconf in Munich, and both events were great successes. Hope to see you all at our future events. Fellow FOSSi Foundation director Philipp Wagner also did a well-received open source chip design talk at the FOSDEM main track.


The award-winning SERV, the world's smallest RISC-V CPU turned five years old, which I wrote about in a retrospective called Five years of SERVing for its fifth birthday. It was also the year when SERV got its big sister, QERV, which provides a 3x speed-up for a marginal extra cost in area. Most of the work was done by a colleague at Qamcom and we did a press release called Qamcom boosts RISC-V which has more details. QERV currently lives in a separate repository, but the ultimate goal is to integrate it into SERV with a switch to select width.

There's also an ever bigger version called HERV on its way. A lot more things has happened with SERV but I'm saving that for the next SERV release announcement. Some of the news were also revealed in the talks I did about SERV at FPL (not recorded), ORConf and the Göteborg RISC-V Meetup (not recorded).


A tour through FuseSoC and Edalize

FuseSoC saw a lot of activity in 2023. We finally got version 2.0 out of the door and with that we could remove a lot of old code and focus on new features such as core file validation, supporting the use of FuseSoC as a library instead of a stand-alone application, cached generators, file tags, minimizing rebuilding and other things that you can read about in the FuseSoC documentation. I also managed to three FuseSoC presentations; at FPL (not recorded), ORConf and the CHIPS Alliance Technology Update.


During 2023 I found some time to work on VeeRwolf...wait, did you say VeeRwolf? I thought it was called SweRVolf. Yes, one of the bigger changes was a complete renaming from SweRVolf to VeeRwolf, since the SweRV cores were renamed VeeR. Anyway, apart from the renaming, the Zephyr BSP for VeeRwolf was upgraded to support Zephyr 3.5 instead of the old 2.7 thanks to one of my Qamcom colleagues. As regular readers probably know already, VeeRwolf is the base for the RVFPGA computer architecture programme, and over the year I had the pleasure of participating in two different RVFPGA workshops and meet the other RVFPGA team members.


The big new things for Edalize the past year was the introduction of the Flow API. I have written about this specifically a few times before,  but to sum it up, it's a complete revamp of how the Edalize backends work that enables new workflows, avoids code duplication, allows for external plugins, avoid unnecessary rebuilds and a lot more good things. As with all new feature introductions, some effort was spent hunting down newly introduced bugs but the code is now in a good shape. I'm using the flow API with Cocotb-enabled simulations as my daily driver now and it works great. It would still be great to get some help porting over all the old backends to the new API.

Speaking of backends, Edalize got several new ones in 2023, namely sandpiper-saas, openroad, design compiler, genus and efinity

Also the documentation saw a lot of improvements.


The CoreScore project did not see any new records. 10000 cores in an FPGA is still at the number one. We did howver see support for some new boards with Intel snatching three of the top five spots with their new Agilex devices and the introductions of new FPGA vendor Efinix and their Xyloni board.

LED to Believe

I tried to push for project LED to Believe to support 100 different FPGA boards by the end of the year. We got reeeeally close, but the big celebration came after the year had ended.  Still there was a healthy number of newly supported boards over the year.


In an attempt to collect all videos of my FOSSi projects, I made a video gallery. After that experience I decided not to add web design to my CV.

As I have started using Cocotb more and more, I thought it would be a good idea to also have a quick example of how to use FuseSoC and Cocotb together, so I made an example design called fusesocotb to serve as a reference design.

This very blog also celebrated since 2023 was the year that saw the 100000th visitor to the site, much thanks to a UVM vs Cocotb post that went viral in late 2022.

And that pretty much sums up my 2023 FOSSi activities. Well, not quite. I won an award also. At the RISC-V Summit I was awarded a Community Contributor award. I was really happy to receive that and to hear that the open source contributions I do are actually acknowledged and appreciated. So, big thanks for that. And thanks also to the RISC-V summit organizers for making it easy to find my seat.

Wednesday, January 10, 2024

How to get more value from open source projects

What's the missing puzzle piece that improves the open source projects used in the services or products you deliver to customers to bring you profit? Read on to find out

Over the years I have created some open source projects. In fact, I have created a lot of open source projects. Some of them are seeing very little real-world use but many of them are used by hobbyists and academic institutions and powering companies of all sizes.

Altogether with all these projects I get a lot of bug reports, suggestions, feature requests, questions and so on. In fact, I get a lot more than I could possibly handle. This naturally means that some of them receive less attention, which can of course be very frustrating. But there is actually one simple thing you could do and I would like to offer this one solid advice on how to get my attention for your particular concern. Are you ready? It's actually quite easy. Ok, here it comes. Pay for it!


Now I can hear annoyed mumbling from some of you (*mumble* *mumble* paying?!? *mumble* *open source* *mumble* *mumble* should be free! *mumble* *mumble*) and I also think there's a lot of you who think this is pretty obvious.

The reason why I'm being very explicit about this is not that I'm angry or disappointed with someone or desperate for money, but over the years I have come to understand that a surprisingly large amount of people and companies just never have thought about this possibility. So I think it's good to make a very simple distinction here. The code is free. My time certainly is not.


That's right. The missing puzzle piece is money. You probably shouldn't pay me to do graphical design though.

Now don't get me wrong. I'm not threatening to kill an open source project every hour until I get a million in unmarked bills and a fueled airplane waiting for me. In fact, I will work on these projects regardless, because I want them to be useful for myself and others, and I really appreciate all bug reports and questions being sent, but if the choice is between getting paid to work on some feature or do it on my spare time, then it's a pretty easy choice to be made. And this is happening already today. I have happy customers who are paying me and my colleagues to work on some of these projects.

And conversely, when I'm working on some project that could benefit from support or a feature addition in an external open source project, I try to enlist the leader of that project when I have the chance. I find it tends to be great value for the money being able to get support from someone who knows every nook and cranny of a particular piece of software and can implement or suggest and desired changes. In fact, it's not just me saying this. A large-scale EU study released in 2021 concluded that the benefit-cost ratio of investing in open source software was above 4, so I say that's a pretty good way to spend your money.

So how does this work in practice? Well, just like any other contract work really. We define the scope, probably sign some NDAs, agree on a fixed price or a T&M setup, shake hands and get to work. Does the finished work product need to be open sourced as well? That's up to you as the customer. For any code that is generally useful, I would typically recommend integrating it back to the open source project where it came from because that saves you from being the only maintainer of that piece of code. But there might also be code that is closely tied to your proprietary work and in that case it's likely best that you keep it to yourself. Most often it's a mix of both and relatively straight-forward to decide what goes where.

And it really doesn't just have to be programming. Most of my projects work perfectly fine as they are, but perhaps you need some training to learn how to use them most efficiently, and I'm happy to supply that as well.

So, this one was a bit shorter than my typically long-winded posts, but I wanted to keep it short and snazzy because time is money, you know.

Monday, October 23, 2023

Five years of SERVing

Making your own RISC-V CPU is a terrible idea. I have said that many times before. There are already a million RISC-V cores out there to choose from, so making another one makes no sense at all. It's kind of the same thing with UARTs. There are probably as many UART cores written as there are people using one. In my opinion, it's much more important to learn how to reuse and contribute to existing cores than making your own. I remember musing at ORConf one year that I was probably one of the only people in the room who hadn't built their own UART or RISC-V core. Although... I actually made a UART a couple of years ago. But only because I had to see if I could make a UART that fit inside a tweet. And you know what's worse? I actually made a RISC-V CPU as well. In fact, the world's smallest RISC-V CPU. But I never intended to. So how did it all happen? Well, on this day five years ago, I made the first commit to the CPU that would eventually be SERV. But to see how it really started we need to go back in time one more month to September 21 2018, just before teatime.

In Gdansk, Poland, we were organizing ORConf. As usual there were a lot of great presentations covering a wide array of topics. One of those was a lightning talk by Michael Gielda of Antmicro. RISC-V Foundation, as it was known by then before the formation of RISC-V International, together with Google, Antmicro and MicroSemi had decided to organize a contest. The idea was to let contestants build either the fastest or the smallest RISC-V CPU for a given FPGA board. It had to pass a suite of compliance tests and it had to correctly execute some example programs, including running the Zephyr real-time operating system. The deadline was November 26, meaning just over two months to put together a core together with at least a minimal set of peripheral controllers to create a small SoC, figure out how to run the compliance test suite and port Zephyr to the SoC.

I remember thinking that this sounded like a lot of fun, but quickly realized there was no way I would ever find the time to do such a thing, so I basically forgot about it for my own part.

About a month after that, I was at home doing the dishes when I started idly thinking about that processor contest. In particular, I started thinking, what if you do an operation on two 32-bit vectors, but only process one bit each cycle? Then you could theoretically reuse the same circuitry 32 times instead of having 32 separate copies of basically the same thing.

At this point I had no idea that was called a bit-serial algorithm and was an idea people had already come up with and that was quite widely used in CPUs in the 70s.

So after finishing up the dishes I had to try out this idea on a piece of paper. It obviously worked for boolean operations such as and, or, xor, but it turned out also to work for additions (and by extension subtractions, which can be described as additions). To convince myself that it actually did work, I made some simple Verilog and ran it through a simulator. And when that worked I got curious about what operations that a RISC-V CPU had to implement so I opened the base specification for the first time and started reading. I was so amazed by the simplicity and how well thought-out the whole thing seemed to be. Having read at least a few similar documents before, this really stood out as a piece of art. And also, it got me thinking that maybe, just maybe, it wouldn't be impossible to actually do a CPU. I could at least give it a try and see. Just for fun, you know.

So on October 23 2018 I pushed a first commit to the newly created SERV repo and suddenly I found myself making a CPU. By this time, half the contest time had already passed which was not an ideal situation, but I did it mostly for fun so it didn't matter too much. The evenings and weekends during the following month was spent doing Karnaugh maps and schematics on paper while putting the kids to bed and then turning it into verilog.

And finally, a few hours before the deadline on November 26, I got the Dining Philosophers Zephyr application running, which was one of the requirements, and that was that. Most of the people involved in the contest were exchanging tips, experiences and metrics on a forum. From the discussions there I knew that SERV had no chance of winning since it wasn't the smallest and definitely not fastest, but it had been a fun experience and I got a new appreciation of the RISC-V ISA, so I was happy regardless. 

About a week later, I woke up to a lot of congratulations on social media. The first RISC-V Summit was ongoing on the other side of the planet and it turned out I had won an award for SERV after all. Not for being the smallest, and definitely not for being the fastest, but for being the most creative solution:)

SERV becomes The award-winning SERV

As the code had been written to meet a one-month deadline, the implementation was rushed, to say the least. But during the development I had gotten plenty of ideas for further improvements. So free from any time pressure, I continued working at my own pace. During the kids ballet lessons I drew schematics and instead of bedtime reading I made more Karnaugh maps and SERV steadily became smaller and smaller.

In March 2019 I submitted an proposal to speak at the RISC-V workshop during the Week of Open Source Hardware (WOSH) in Zürich that summer with the first ever presentation of SERV called Bit by bit: How to fit 8 RISC-V cores in a $38 FPGA board. The board in question was the TinyFPGA BX, in case someone is curious.

The presentation was accepted, but it turned out to be a terrible name for a talk. You see, before the end of March I had made some changes to fit 10 cores inside that FPGA. After arriving in Zürich I had that number further increased to 14, and on the day of the presentation I had managed to fit 16 cores into that FPGA, so I had to make some last minute adjustments to the slides. The talk went fine but not great. However, for some unknown reason it has managed to become the second most viewed video on the RISC-V YouTube channel. I guess this is the closest I'll ever be to become a YouTuber.

Note to self - never put performance numbers in a presentation title

Another important milestone that happened just before heading to Zürich was that SERV got its logo.

Designed using WaveDrom, the amazing timing diagram (and logo designer) tool. Here's the source for the logo

In September the same year I learned about another contest, the European FPGA Developer Contest, by Arrow Electronics. In a fit of hubris, I entered that contest too, thinking I had such a great idea that there was a serious chance of winning. The idea was a programmable heterogeneous sensor fusion platform in which you had an FPGA that was connected to a bunch of sensors. Data was collected from each of the sensors and combined into a message stream. The novelty here was to dedicate a SERV core to each sensor. That meant that each CPU could handle the parts of sensor communication that was easiest to do in software, like configuring and handling spi/i2c transactions, while using FPGA fabric to do custom protocols and heavy processing. I called the project Observer and released it on Github


I felt very clever about the naming scheme of Observer. The sensor data came in through a "collector" and then the "base" controlled the data flow. Together those formed a "junction". All data streams then merged and was then sent off-chip through a "common emitter".

I still think it's not a half-bad idea, but at the same time there isn't a super clear use-case for it and it would require a lot more work than I was willing to put in to get it really usable. Speaking of work, I wanted to have a nice logo for the project, so I asked my very talented fiancée to draw one for me. She didn't have time so instead I made the crappiest logo I could come up with in ten minutes, proudly showed it to her saying that it had been released on the internet. The plan worked perfectly. Upon seeing the logo, she felt so embarrassed for me that I soon had a stunningly beautiful logo replacing the old one. The moral of the story: If you do something bad enough, someone will eventually be unable to resist improving it.

I have done many ugly things in my life, but the original Observer logo still hurts to look at

In the end I didn't win anything, although I did get to keep the very nice cyc1000 FPGA board. I believe some AI something something won, but I'm not sure actually. The other projects were never released anywhere to my knowledge and the whole thing felt slightly shady. And after that, Observer sank down on my list of priorities and has been laying dormant, but it did have one everlasting legacy that became its own, much more well-known project.

While working on Observer, I used the on-board sensors to have some data sources to combine. There was a button, an accelerometer and...well, that was it. It felt a bit silly to have a sensor fusion system with only two sensors though. The proper thing to do here would had been to solder on a couple of more sensors, but I chose a less work-intensive way, and instead added some SERV cores that just wrote random strings without ever being connected to any sensors. (Remember, kids! Always cheat if you have the option). After adding a couple of more SERV cores generating fake sensor data, I realized that there was still a lot of room left in the FPGA. So I added a couple of more cores and there was still room left. By now I had completely abandoned all pretentions of creating a useful system and just wanted to see how many SERV cores I could stuff into the FPGA. Then, of course, I got curious to see how many cores I could stuff into other FPGAs that I had at hand. Somewhere around this point I realized that this had now become its own project, with the sole intention of finding a metric for how large an FPGA is, by measuring the number of SERV cores they could fit.

For the first three minutes of existence, the project was called ServMark until I settled in the much catchier CoreScore. I then went to my fiancée again, got an amazing logo and started checking the scores for other boards at my disposal.A couple of days later I registered, got some JavaScript help to create a dynamic high score list and started inviting other people to submit their CoreScores. At the time of writing, there are now more than 40 different FPGA boards on the list, with the top entry hitting 10000 cores.

CoreScore with its beautiful logo and the top five entries at the time of writing. Check out the site for the latest results.

I believe 10000 cores is the highest number of RISC-V cores anyone has put in a chip, at least at the time of writing. Not that they are doing anything useful, but still, I sincerely believe it has a useful purpose. First of all, it gives a rough metric of the relative size between different FPGAs, which normally tends to be an apples-to-oranges comparison thanks to different LUT sizes and available memory blocks. It's not perfect, but provides some guidance, especially to those new to FPGAs. It's also a way to compare the efficiency of FPGA toolchains. With open source alternatives popping up for more devices, it can be interesting to see how well they compare to their proprietary alternatives. For my own part, the prospect of getting higher scores has been a good motivation for further optimizations of SERV, so it has been helpful in that regard too. The rest of 2019 I was working on and off continuing to bring down the size of the core, and by the end of the year there was no doubt that it had surpassed all other RISC-V cores in terms of size, making it The award-winning SERV, the world's smallest RISC-V CPU.

As we rolled into 2020, there was something in the air that had a big impact on most people's lives, and gathering in large groups at conferences was just out of option. What happened instead was that people took their slideshows into video calls instead at online conferences. I went to one of those in early 2020 and I absolutely hated it. I could see no reason for watching decapitated heads - most of them struggling to handle the lack of audience feedback - reciting slides decks aloud, when I could just as well just read the slides myself or watch a recording where I could skip the boring parts or listen again if something wasn't clear.

Then, in the spring, I was asked if I could present at one of these virtual conferences. With all the latest advances of SERV I really wanted to present that, so I accepted, but also started thinking if there was a better way to use the new medium than the horrible slide decks I had been subjected to. I decided early to prerecord something. It didn't make sense for the audience to hear me mumbling and losing the thread, when I could just as well edit it to something more coherent beforehand and save everyone's time. I also realized that since everyone would watch it on a computer screen, and not on a projected screen in a badly lit room, I could afford having much finer details in the pictures. And most of all, I could have animations. And sound! So, stylistically inspired by a 1981 video on sorting algorithms we had been forced to watch in school and educational animations from my childhood, I equpped myself with a Korg Monotron for sound effects and a Python animation library to work on creating a fully immersive multimedia edutainment experience about SERV.

Creating a video like this also meant that it could not only be used for that particular online event, but could also be viewed anytime by people who wanted to get an introduction to SERV.

On April 29, the resulting video was aired during the 1st Virtual Munich RISC-V Meetup, about 24 minutes in. And for those who prefer just to watch the SERV video without the rest of the talks, it is also available here.

Doing a video was a lot of fun, but there is another common way to make people aware of your work - writing papers. It so happened that my friends at ETH Zürich had just released a paper called Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing. Now, 4096 cores with area-efficient floating point support is of course impressive, but I thought I could do better in at least one metric, the sheer number of cores. So over 2 days in May 2020 (~15 hours of struggling with Latex and maybe an hour actually writing the thing) I put together the first paper on SERV, Plenticore: A 4097-core RISC-V SoClet Architecture for Ultra-inefficient Floating-point Computing. For some reason it was never accepted into any conferences and has been cited approximately zero times, so I guess I will have to wait a bit longer for my academic career to take off.

At this point, SERV had been implemented in plenty of FPGA projects by myself and other people, but to my knowledge hadn't been taped out. But 2020 also brought the OpenMPW program with free tapeouts for open source projects. Wanting to both add support for the OpenLANE open source ASIC toolchain in Edalize and getting some real SERV chips, I applied for funding through NLNet Foundation to add Edalize support for OpenLANE and use that to make a small SERV-based SoC. Both objectives were successfully accomplished and thus the Subservient SoC was born, which is a minimal SoC, just like Servant, but more suitable for ASIC implementations.

Over the year SERV continued to see more code commits titled optimize this or simplify that, and in addition to working on the core itself, I added documentation and other people got it running on more FPGA boards

Having so much fun making the first SERV video, as well as one on Edalize (that actually won an award!), I decided to make another one for the 2021 virtual event RISC-V Forum : Embedded Technologies. In that case it was even better to have it prerecorded since it was broadcast around 4 am my time and I don't think I would have given a very good talk at that time of the day. This time I went more for day time TV aestethics and I think it turned out pretty nice.

This video was also the debut of the first major external contribution, with the (optional!) support for the M extension contributed as a Google Summer of Code project.

At this time, I was still trying out some further optimizations, but it started to get really, really hard to make further improvements to the size. I could spend a week working on an idea for an optimization only to find out it didn't improve anything at all, or even made SERV slightly larger. Generally, only 1 out 10 ideas for optimizations yielded any positive results. There are still some ideas left to try out but they require a bit more time and effort than I have been able to spend. I will mention a few of these later on in the future outlook.

The amount of effort trying to further minimize SERV is pretty much the inverse of the resource usage chart


So the last years have focused more on features. Early in 2022, I added support for ViDBo, the Virtual Development Board protocol that allows interacting with a simulation of the Servant SoC through an interactive picture of an FPGA board presented in a web browser. It's a pretty fun way to try out SERV if you don't have a real FPGA board. The spring of 2022 also brought (optional!) support for the C extension thanks to another student that I mentored through the LFX Mentorship Program.

But no matter how small or featureful SERV is, what matters in the end is where it is actually used. As with most of my (and others) open source projects, you typically have no idea about most of the places where it is implemented. People only tend to reach out when they need support (which I'm happy to offer for SERV and my other open source projects, and already do for several clients) but I have learned about some use-cases over the years, such as USB-to-serial converters, DDR2 initialization, GPS synchronization for RADAR and power management. Two of my favorite applications is a cranial implant ASIC and a research project that has taped it out on the PragmatIC FlexIC process, likely making SERV the first RISC-V CPU to be taped out on plastic film. In general, the small size, and hence the simplicity has made it a an attractive choice for taping out on novel processes where yield is not yet maximized.

And that pretty much sums up the first five years of SERV. What about the next five years? There are a number of half-finished ideas that I would like to finish up as well a couple of things that I haven't even started. Looking outside the core, I have more or less finished a new framework for CoreScore that will both allow more cores to fit in the same area as well as making routing easier and thereby faster for the P&R tools. As a point of reference, it apparently took around 48 hours to do P&R on the 10000 cores that occupies the number one spot on the list right now. And I would of course like to see both CoreScore and Servant running on more FPGA boards, and that's where you, my dear readers, can help out by making sure it runs on your board and submit a patch to get it added.

For the core itself, more ISA extensions would be nice. Most of them, like floating point support, doesn't make very much sense, but why not. It would be very cool to run Linux on SERV, and for that we probably need to at least implement the atomic instructions and some more pieces of the privilege spec. More interrupt sources and a debug interface has been requested as well, which makes sense. I also have an almost completed project that uses some nifty math to calculate the smallest decode unit, given some constraints. The plan is to do a proper write-up about that and perhaps even put out another paper.

But the thing that most people seem excited about is having 2-, 4- and maybe 8-bit versions of SERV. Those would be slightly larger, but also almost 2, 4 or 8 times faster than SERV is today. Preliminary results have shown that the size of wider versions grow less than one might think. There is actually a big reveal coming up in this department soon. If you read this a couple of weeks after I write this, you will already know what it is, but I'm keeping it a secret for now :)

So, all in all, I'm pretty damn proud of SERV and it has been a lot of fun working on it. It has also managed to attract commercial interest from some of the larger players in the RISC-V space while still remaining very much a passion project for which I can curl up on the couch with a cup of coffee on a Sunday afternoon and think about some optimizations, draw some schematics or Karnaugh maps just like normal people do crosswords.

Happy fifth Birthday, SERV. As a proud parent, I'm amazed to see how small you have become since you were a baby.

Friday, July 14, 2023

Happy 100k!

Sure, it's not a million, but then again I'm not Arianna Huffington. Given how extremely niche these topics are and that blogging has been declared dead for about a decade now (about as long as I have been blogging, come to think of it), I think that a hundred thousand visitors is actually really really good. It's sure is a lot more than I would ever had imagined when I started doing this ten years ago.

The proof! Also notice the massive amount of followers this blog has gathered over the years

I don't remember anymore why I started blogging but I guess I felt that the world deserved to hear my opinions on random things. Also, there wasn't really anyone else writing about the things we were doing with OpenRISC and open source silicon in general at that time. Nowadays, any industry conference and news outlet worth mentioning will have FOSSi content but back then, most people in the industry just didn't get it. (To be fair, most industry people still don't get it given the number of "open source" panels I have seen at conferences consisting of people with zero insights who just sit there making up stuff because their companies have paid to have them there. Thank god for events like ORConf and Latch-Up!).

Enough about why. I thought it could be fun to look at how instead. There are a number of events that have lead to the magic six figure number. So, let's go back to the beginning. What was supposed to be the first article actually became the second article because when I wrote it I went off on a tangent and ended up with a whole article on scope creep instead. And I think that pretty much sums up my writing and how I (fail to) get things done in general by starting something and then ending up getting stuck in some detail and end up doing something completely different instead. Not that it really mattered though. The readership consisted mostly of friends who I forced to read the blog and questioned them afterwards to make sure they had read the whole thing. But I kept writing every now and then about different things around the OpenRISC archicecture. And then one day the people in my regular IRC chat channel notified me that my latest blog post was on the frontpage of Slashdot! For all you youngsters out there, Slashdot used to be the place where all nerds got their news and if you were featured on the Slashdot frontpage during its heydays you'd better have a server that could handle a massive influx of traffic. There were stories about servers burning up after being "slashdotted" because they were humble machines that never thought they would see so many visitors in their life. Luckily, 2014 was a bit after the heydays of Slashdot and Google hosted the blog so I don't think anyone got particularly worried about overloaded servers. But it did make some difference in numbers for the statistics for this humble blog.

The Slashdot effect in action


A couple of hours later, the Slashdot crowd moved their attention elsewhere and never came back, as is painfully clear from the statistics. I kept writing now and then however, mostly about FuseSoC, FuseSoC and FuseSoC which apparently didn't lead to any new front page news. A couple of years later though, I wrote about my complicated love-hate relationship with IP-XACT and that seemed to have struck a chord with some people. This one was more of a sleeper hit that didn't cause any immediate sensations but eventually became my most read, and definitely most cited, article I had written.

IP-Xact - an evergreen of confusing and questionable EDA standards

Getting noticed in premier geek media and cited in academic circles have both been happy surprises. There's no getting around that does feel a little more fun writing when you get these kind of validations. So I kept writing to a tiny uptick in readership after the IP-XACT article, again mostly about FuseSoC, FuseSoC and FuseSoC, but after starting way too many FOSSi projects over the years, like SERV, SweRVolf and Edalize, I also started doing yearly round-ups partly to remind myself what I did over the past year. And then one evening, after spending a day in a workshop with a clueless EDA vendor trying to shove their awful tooling down our throats, I wrote about one of the FOSSi projects that I believe will have a large impact the coming years, namely cocotb and how it will save us from SystemVerilog for verification. And that, my friends, turned out to be of more interest than anything I had previously written about.

Cocotb and Python coming to steal the show, as always

Suddenly, the hordes of geeks coming via Slashdot was just a blip on the radar compared to the angry, happy, confused and relieved verification engineers who showed up en masse to state their opinions, tell their stories, show their support or ask what this was all about.

Not everyone agreed, but it was clear that it was a hot topic. And that's the third thing I'm really happy about, to have ignited discussions around the status quo of chip design and get people to bring their ideas and opinions to the table.

I really hope to continue writing about stuff when I'm in the mood and can find the time. And hopefully some people find it interesting, at least occasionally. Given the sporadic output of the past, we will never know the next time that happens, so I'll just say until then!

Thursday, April 27, 2023

FOSSi Fantasies 2022


2022 is behind us and as usual I'm wrapping up my open source silicon efforts of the past year in a blog post.

Hang on...a 2022 retrospective...? In... the end of April??

Yes, I am fully aware that a third of 2023 has already passed, thank you very much, and that it's way too late to write a new year's retrospective. I have just been extremely busy, but I still wanted to get it out of the system. It's only been a month since the Persian new year, so in the light of that I'm not that late. Anyway...

This year, the list is shorter than usual simply because I haven't done as much FOSSi work as previous years. The upside is that I managed to complete this article close to new year, rather than much later as in previous years.

But why have I done less work? The main reason is that I have been terribly busy with my day job which is mostly of proprietary nature. While I do a lot of interesting stuff in this capacity, it's unfortunately not much I can talk about publicly. This is another reason why I greatly prefer open source work, so that I can show, share ideas and collaborate with other people.

There is one thing however from my day job that I can and want to talk about. This summer we launched a fully remote office that we call Qamcom Anywhere. Having worked remote myself for the past four years I have been pushing to make it possible for more people in the company to do the same, and this year we got it going for real. Qamcom Anywhere has been a massive success and we have found amazing new colleagues in various parts of Sweden where we previously haven't been looking before. Given my previous experience with working remote as well as working on open source projects, which by nature tend to be highly distributed, I was tasked to run this new office. As part of the campaign we also recorded a commercial, so I can now also add movie star to my CV ;)

As for now, we have launched Qamcom Anywhere in Sweden, but hope to spread to more countries in the future. Stay tuned if you want to be colleagues!

But even outside of this I have found some time to work on my long list of open source silicon projects.

Let's start by looking at what has happened to FuseSoC over the past year. Most of the effort has been spent on getting FuseSoC in shape for a long overdue 2.0 release. A couple of major features and changes were identified as being important to complete before this release. Most notably is the support for the new flow API in Edalize, but a number of critical bug fixes and backwards-incompatible changes were put in place. Unfortunately, we never manages to get the 2.0 release out of the door, but we got close and at least released a first release candidate in late December, while the final release saw the light in the beginning of this year.

Most of my other FOSSi projects made good progress. There were a few new Edalize releases, SweRVolf got some new board support, a maintenance release to an old i2c component that I maintain (hey! it's important to put some effort into cleaning up old code, not just rewrite new code all the time) and even good old ipyxact saw a new release, which now contains ipxact2v, a very handy tool to automatically convert IP-XACT designs to Verilog top-levels. It's not fully complete, but the functionality that exists is already coming to good use in various projects.

The project that probably saw the most interesting news in 2022 was SERV. SERV itself gained support for compressed instructions, thanks to Abdul Wadood who I had the great pleasure of mentor through Linux Foundation's LFX Mentorship Program. And aside from improvements to the core itself, in 2022, fellow FOSSi superstar developer Florent Kermarrec, who might be most known for Litex, managed to run 10000 SERV cores in a Xilinx FPGA. This seems to be the current world record right now for the most RISC-V cores in a single device, but I'm very curious how the competitors will react (looking at you, Intel!).

This year was the first in three years where I didn't create any new videos (or Fully Immersive Multimedia Edutainment Experiences, as I prefer to call them). Instead I did a number of presentations using old-fashioned slides for a live audience. The first of them being the RISC-V Week in Paris in early May where I did both a presentation on FuseSoC as well as one on SERV. The SERV video was unfortunately never published, but the slides for the presentation called How much score could a CoreScore score if a CoreScore could score cores? can be found here. I did another FuseSoC talk at FPGA World in Stockholm, Sweden which was also not recorded, but the final one, from the RISC-V Summit in San Jose was. This one was title SERV: 32-bit is the new 8-bit and aims to look at how RISC-V can be competitive in the traditional 8-bit market thanks to SERV.

Both at the RISC-V week as well as the RISC-V Summit I also got the chance to meet with the people behind RVFPGA, a project I have been involved with almost since the start. For those unaware, RVFPGA is a free computer architecture course by Imagination University Programme that runs on a slightly modified version of SweRVolf that I built a couple of years ago. Right after the RISC-V Summit I also got the chance to watch a RVFPGA workshop in action, and it was super fun to see all these students working their way through the labs.

Let's see.... what else then...hmm... you know what? I'm sure other things in 2022 as well, but my memory is fading and May is just outside the door waiting to come in, so let's just cut it here before the rest of the year passes too. Here's to 2023. Happy new year!

Wednesday, April 19, 2023

FuseSoC 2.2

Do you know the best way to find out who is using your open source software? Introduce bugs! You will suddenly come in touch with a lot of users you didn't know existed. And let's just say I found out about a lot of new users after the release of FuseSoC 2.1. And with FuseSoC 2.1 having a lot of new features, it's perhaps not too surprising that the odd bug crept in.

But enough about that, because FuseSoC 2.2, the topic for today, has hopefully fixed what was broken. And of course we have a couple of new features as well, even though the list is somewhat shorter than usual. But let's see what the new version has to offer

JSON Schema

Generally, I'm pretty happy about the code quality of FuseSoC. It has proven to be relatively friendly to new contributors and has gone through a couple of major refactorings without too much problems over its almost 12 years of existence. But there is one part of the code base that I usually try to stay clear from.

Deep inside of the FuseSoC code base there is a yaml structure encoded inside a Python string that is parsed when the module is imported to dynamically create a tree of Python classes which are then used to recursively read and validate core description files. Pretty clever, right? This is a fantastic example of the sort of thing that seems great because it's possible and not too hard to actually do with Python. Now, the thing is, because of the cleverness of the code, it is pretty much unreadable even for me who wrote it. Every time I need to fix some bug in this area of the code I end up spending hours trying to figure out how it all works, all the time crying and asking why oh why I built it like this in the first place.

So what little time was saved on writing some more verbose code, we pay for over and over again in maintenance. Not to mention all the bizarre corner cases that arises because the code is trying to outsmart itself. Things that required us to create classes like this:

The time was ripe now to rework this whole thing into something more sensible. So what we do instead now is to have a JSON Schema definition of the CAPI2 format...encoded as a string in a Python module deep inside the FuseSoC code base. I understand this doesn't look all that much like an improvement, but it's the first, and most important, step of a journey.

Short-term this leads to a more maintainable parser and validator because we only need to care about the definition. There is battle-proven Python code already that does the actual validation and is better at pointing out where in a core description file there is an error. There are also other utilities for generating documentation to offload this from FuseSoC itself. The parsing is also a bit more consistent now and supports use flag expansion in more places.

But long-term, this paves the road for actually splitting out the CAPI2 definition to its own project that can be readily reused by other tools without having to use FuseSoC. Having the validation code in jsonschema allows for much easier reimplementations and utilities written in other languages than Python. The first case that comes to mind is JavaScript for having web-based utilities around CAPI2 or built-in validation of core description files in e.g. VS code. But it also makes it easier to implement support for CAPI2 files directly in EDA tools written in Java, C++ or why not Rust.

It should be noted that the new parser is a bit more strict than the old one, so it might complain on files that were previously deemed ok. Hopefully there shouldn't be too many of those. There's also a new command-line switch --allow-additional-properties that can be turned on to make the parser more relaxed towards elements in the core description files that it doesn't know about.


The other thing I want to mention in this release is a small code change that I think will have open up for more use cases. It's now possible to set tags for files or filesets, very much like we can set file_type or logical_name today. FuseSoC itself doesn't care about the tags, but they are passed on to Edalize through the EDAM file. In Edalize, since version 0.5.0 we have begun to look at tags in some of the flows and take decisions upon them. The only tag that is recognized today is the "simulation" tag, that can be set on HDL files to indicate they are intended for simulation and not for synthesis. This change opens up for use-cases such as gate-level simulation where we first send our code through a synthesis tool and then the created netlist is simulated together with a testbench. By marking the testbench files with simulation, we tell the synthesis tools to not try to synthesize them into the netlist but instead pass them on to the simulator untouched. Another future use-case is for TCL files. There might be many tools in a tool flow that parses TCL files and so far, there hasn't been a way to tell the backend for which tool a particular TCL file is intended. I suspect we will see a whole bunch of more use-cases in the future.

Other things

I mentioned some bugs, right? A big one was that users of the old tool API (which I believe is still most users) noticed that the FPGA image or simulation model was not rebuilt when source files were changed. The new flow API has some properties that allows us to track changes in a much better way and avoids unnecessary rebuilds in many cases. Unfortunately, when these changes were made we didn't properly test how that affected the tool API.

Another issue was reported from users who uses --no-export together with generators. The recently introduced caching mechanism forced us to rewrite much of the code around generators and unfortunately we ended up missing this case, where the generated code got removed before it was used. Whoops. Also fixed now.

All in all, I hope you enjoy the new features and the new release. Happy FuseSoCing!