Tales from Beyond the Register Map: 2013

Wednesday, October 9, 2013

orconf2013; Reports from the yearly OpenRISC conference

Dear diary,

I just got back from the yearly OpenRISC conference, orconf, which this year was situated at the university in Cambridge UK.

TL;DR;
The conference was a great success! The flight there was not.

After forcing myself out of bed early on Saturday morning, I spent half the day in a barn on the outskirts of Göteborg, since a broken radio transmitter delayed the flight over three hours. That was enough to make me miss out the introduction talk as well as the presentations from David Greaves on a SystemC model of a multicore OpenRISC with power annotation, Stefan Wallentowitz on OpTiMSoC and Jonathan Woodruff on BERI: A Bluespec Extensible RISC Implementation. Thankfully, all talks were recorded and will be available online when they have been transcoded.

The delayed flight also made me miss the first minutes of my own presentation on ORPSoCv3, which served as an introduction to the workshop later in the afternoon. Fortunately, I had plenty of time allocated to my short introduction on the subject. The presentation was actually a bit shorter than what I had originally planned since there has been way too many other things to do lately, and I hope to have time to make a more complete introduction for next time.

After the presentations, the conference went on with a workshop based around getting OpenRISC running Linux on a DE0 nano board. Embecosm kindly provided enough hardware to let all participants team up in groups and have their own board to play with. Slightly different incarnations of this workshop has been presented by Embecosm and Julius Baxter at other events such as OSHUG, but for me it was a milestone since we decided quite recently to base the hardware workflow on ORPSoCv3, which is what I have been working on for the last few years.

It all went surprisingly well. Except for a participant who had problems with his python version, I didn't hear many complaints. Most of the credit will have to go to Julius Baxter for preparing and writing down precise instructions on how to get started, Stefan Kristiansson who prepared a precompiled OpenRISC tool chain and did a ORPSoCv3 port for de0 Nano (in 30 minutes!) and Franck Jullien who's work on OpenOCD have made debugging easier as well as providing the first ORPSoCv3 board port which was a proof that it actually worked; but it made me very happy as well, since it shows that the time I have spent on ORPSoCv3 has paid off. It's now a little easier to build and simulate an OpenRISC based SoC.

Since I don't have a DE0 nano board myself, I spent the time trying to make a port for my trusty ordb2a board based on the de0 nano port instead. It was actually my first own board port, and the process went well, but when the time came to program the board and connect to it, it turned out my development environment was not up the task, and I had to spend most of the time compiling debug software and hunting down patches. The outcome is that I now have a patch for OpenOCD to make it work properly with the ordb2a boards that I should send upstream soon.

The first day at the conference ended after the workshop, but as usual with these events, the fun doesn't stop when we leave the conference building. We all went for a fantastic dinner at St John's Chophouse where we continued to talk about everything from switching characteristics of crypto algorithms to high-level HDL languages. A few of us went on to try out some of the fine establishements in Cambridge afterwards and stopped by a pub where you could pay your drinks with bit coins. After the initial amazement of being able to pay things in real life with bit coins, this started another discussion on critical paths and process nodes for Bitcoin mining ASICs. It's funny how all conversations seem to end up in that direction.

After some well-deserved sleep, the sunday started of with Stefan Kristiansson and Julius Baxter presenting the latest improvements on mor1kx. Stefan continued the great tradition from last year of being the one who provide us with eye candy. This year he showed us Day of the Tentacle running under SCUMMVM via libSDL and glxgears running under X. These are two things that wouldn't have been possible if not for the great work on the toolchain during the last year by Sebastian Macke, Stefan Kristiansson and probably others who I probably should mention but can't remember right now. mor1kx itself has grown into a quite mature CPU now with three different pipeline implementations to cover the range from running a full-blown Linux system down to deeply embedded bare-metal applications. It has also found it's use in OpTiMSoC and will probably be the default or1k implementation in ORPSoCv3 in the near future.

Next in line was Martin Schulze who grew frustrated on the tedious work of setting up an ORPSoCv2 port for a new board and wrote a configuration system that's hooked up to Eclipse. While it might sound as conflicting with the work on ORPSoCv3, it will actually be a great combination instead, and I look forward to integrate the two efforts to make new board ports even more painless.

Last year we had a presentation where the first OpenRISC in space was unveiled, and the space theme continued in Guillaume Remberts presentation OpenRISC for space applications and EurySpace SoC. Guillaume showed clearly the advantages of using OpenRISC in an environment where failure is not an option, as it's flexible, battle proven and not tied to a single vendor.

After a well-deserved coffee break, Franck Jullien took the stage and showed the new and improved GDB. Together with his work on OpenOCD that was just accepted into upstream about a week ago, the debugging support for OpenRISC is in better shape than ever.

Last of the major presentations was Davide Rossi who talked about his group's research on extremely low power ASIC SoCs, where they had used multiple or1200 in a 28nm ASIC that will likely tape out later this year. During their work with or1200 they made several improvements to the critical path that might find it's way into the upstream or1200 repo. As there was a room full of OpenRISC experts we shared ideas on how to verify the ASIC once it was completed.

The day went on with a discussion on the feasability of an OpenRISC 1000 successor. The idea started out a few years ago as an way to rectify some of the deficiencies of the or1k ISA, and some of the thoughts that has come up over the years are summarized on the OpenCores or2k page. The general opinion this year however was that with the current number of contributors to the OpenRISC project, it would be too much work to start a new implementation. If this was taken up as a student project or as a research project we would embrace it with open arms, so all academics reading this, please come forward.

One of the interesting things when working on a smaller architecture is that you some times realize how much the design choices of the large players are taken for granted. Sebastian Macke discovered that the way our ABI is defined differs in some regards from how ARM and Intel does things. This is usually not a problem, but combined with how some programs break the C standard for varargs in a subtle way, we have problems that only manifests itself on some architectures. Tricky! The discussion was to decide if we should change our ABI to avoid this problem, but decided against it for now to see how widespread the problem is.

Following the ABI discussion was two short lightning talks. One by Julius Baxter to show jor1k, the JavaScript OpenRISC emulator by Sebastian Macke, as it had been referred to multiple times during the conference and also to mention that most of us could be found on the #openrisc channel on irc.freenode.net. The other talk was by Jeremy Bennett on the latest improvements for Verilator. Apparently verilator now supports most of the synthesizable sub set of System Verilog which is getting more and more important as the rest of the world is moving in that direction. This means that we can continue to use this fantastic tool in modern development environments, and a few high-profile projects where verilator is used was also mentioned.

To finish it all off, we had the yearly bug squashing session. Out of 71 open bugs, we were able to mark nine bugs as invalid or already fixed. Eight other bugs were repreoritized or reassigned to the right person, and I managed to fix one RTL bug in or1200 on the flight back home. In total that means we could close around 12% of our open bugs with little effort. Responding to bugs that has been reported in a timely manner is extremely important. We have previously seen potential contributors who have lost interest after seeing that their reports go unnoticed, so in addition to the yearly session, I hope that everyone goes through bugzilla from time to time to find things that can be fixed.

Flight back home was delayed once again, but we managed to catch up enough so that Ryanair could play their we're-on-time-fanfare, and once home I slept like a baby. Unfortunately, my baby didn't

In retrospect, a few interesting observations could be made from the topics of the talks. It seems that multicore OpenRISC was a large part of the presentations this year for different reasons. Stefan Wallentowitz focus was on many-core implementations, while Guillaume Rembert needed it for redundancy in fail-safe applications and Davide Rossi needed it to try out different levels of power optimizations. Hopefully, the combined work might lead things forward and having a conference like this is a great way to make people aware of each others' work. The other thing, that has also been my pet peeve and the reason for starting work on ORPSoCv3 is the need for easier system generation and interconnections.This is being addressed now, and there seem to be a lot of interest in helping out in this area.

I enjoyed the little I had time to see of Cambridge, and it's a fantastic feeling to be in the company of so many talanted persons, both from academia and industry. We were approximately twice as many as last year which also goes to show that the project has a healthy growth. This is also seen by the influx of contributors, many of who unfortunately weren't able to join us. Hopefully, we will meet next year, or at some other occasion. I hope that everyone views the recorded presentations when they become available on the conference page, as this short summary doesn't do justice to all the great talks.

Finally, special thanks should go to Julius Baxter for once again organizing the conference and making sure it's got off the ground, to the whole of Embecosm for sponsoring the event and specifically Jeremy Bennett for rounding up a few of the speakers that we otherwise would have missed out on. Also thank you to David Greaves and University of Cambridge for hosting us and taking care of thirty meek geeks during the weekend.

Hope to see you all again!

Tuesday, June 4, 2013

initial begin

Software is built up by layers and layers of abstractions, which has the pleasant effect of hiding all the underlying madness that computer software is built upon. If we still would be fearless enough to take the long and winding road down a user facing application, through the libraries, operating system and drivers, we would eventually end up at the register map. This is an impenetrable wall even for most of the hardcore close-to-the-metal low-level-driver gurus. This is the boundrary between the chips and the instructions that are meant to make the chips do something useful. This is however not the final frontier. Some fearless souls, calling themselves FPGA/ASIC designers, Digital Design Engineers or Hardware Developers have decided to take the spirit of Open Source and Free Software beyond the wall and into the realms of the silicon. In the Open Source world, hardware is generally considered Open Source friendly when there exists a documented register map and perhaps some use cases, so why do more? Well, for us on the hardware side, we want a little more than that. Hardware can be just as buggy as software, and working with closed-source IP cores are generally a pain. This ranges from the obvious problems that you can't debug your code properly to the insane license agreements and in many cases strange restrictions that the license holder can impose on the development environment. While these are all very practical reasons for doing all this, I think that the driving force for most of us is that it is pretty damn fun!

I, myself, come from a background of having worked professionally with FPGA and ASIC as a consultant for five years now, and been involved with software and hardware since my first stumbling peeks and pokes in QuickBasic in the mid-nineties. Over the years, I have enjoyed the great success of Open Source Software and contributed back to a bunch of different projects. In 2010, I got involved in the OpenRISC project, through my employer at that time. These last three years working on the OpenRISC project in my spare time, I have found myself in company of extremely talented people who are not afraid of learning new things or take on overwhelming challenges. Combined with a large ecosystem of IP cores, toolchains, operating systems, test suites, documentation, development tools and basically everything else you expect from a processor architecture, this has come to be a huge and very interesting project.

The main reason for starting this blog is to shed some light on all the things we are working in the digital domain of the Open Source ecosystem, and in particular the OpenRISC project. Searching around the internet, it turns out that the amount of written information is sparse. Apart from our ever-growing official OpenRISC page that holds our ever-growing wiki and links to most things related, Sven Andersson has written a series of articles about his experience with the OpenRISC as a newcomer to the scene. This has provided us with great feedback for how we can improve and streamline things, as well as getting many people interested in trying out this mythical CPU creature. Franck Jullien has also written some articles on his blog about his work with some cool debugging features for OpenRISC. Another resources that has served well over the years are a series of application notes from Embecosm on SystemC modelling, GDB protocol implementation and C libraries, written by long-time contributor Jeremy Bennett. Many of those articles has found use also outside of the OpenRISC world. There are probably many more articles that I have forgotten about, and it would be great if everyone who feel left out can come forward so that we can collect a definite list of all related articles. In addition to written articles, many of us like to show up at conferences to talk about what we do in the project, hang out and hack on things. Julius Baxter and I have held presentations (here and here) and a workshop at FSCONS and visited FOSDEM last year. 2012 was also the year that we held the first (hopefully annual) OpenRISC Conference, which was a great success. Sitting through the presentations, it was clear that the project is growing and that an impressive amount of work is being done by extremely talented hackers.

While all of these articles and presentations has given us a chance to show what we are doing, there are still a lot of really cool development that goes unnoticed for the unitiated. I hope to get some time to write about all the fun stuff that is going on, and most of all, write about all the fun stuff that I am doing.

Most of my own work isn't directly related to the OpenRISC CPU itself, but on the infrastructure that surrounds it. I'm maintaining an ethernet MAC, has started an extremely low-volume mailing list on the Wishbone bus (which is the bus that OpenRISC as well as a few other Open Source CPU:s use to connect to other IP cores), done some C library hacking, fixing bugs, reminding other people about forgotten patches and fixed bugs and been around to companies, conferences and schools to talk about using Open Source IP cores in their workflow. Since about two years my main focus has been on creating a platform for simulating individual cores as well as simulating and building systems based on the many fine IP cores that are available at OpenCores and other places. The project is called orpsocv3, or the OpenRISC Reference Platform System on Chip version 3, and is planned to replace our current platform orpsocv2. orpsocv3 is packed with functionality that will hopefully make development easier compared to the existing system, and will hopefully be prominently featured in coming articles. The code is still in an early development stage, but you can check it out here if you want to to see for yourself what all the cool kids are talking about. To avoid the infamous scope creep, I'm resisting the urge to write more about orpsocv3 now and instead conclude that proper introductions have been made and we can go on to the technical stuff for the next post.

end

Monday, May 27, 2013

Scope Creep

During my work on ORPSoCv3, I realized that there were some problems with the SPI Flash memory model that I planned to reuse from ORPSoCv2. To use the file (and more importantly to make it publicly available) it turned out I needed a written license agreement. I now had three choices:

1. Pretending like I didn't see the license - Not gonna happen. I'm building a product intended for both hobbyist and commercial use, so I want these things to be done correctly, or I will risk that they come back and haunt me.

2. Try to contact the right people to ask for a relicensed version of the file. It could be worth trying, as there is at least a company name and the author's name in the file header that I could use to track down the owner. On the other hand, the last update was in 2007 and even if I would find someone who could claim ownership, it's not very certain that they would allow relicensing the file.

3. Rewrite the code. Even though I have tried hard to not reinvent too many wheels in ORPSoCv3, this is sometimes the easiest way forward.

The third option, which is what I chose, has some added benefits too. The SPI flash model in ORPSoCv2 is targeted for a specific flash that we used on an old board that I don't intend to support in v3, so the new code could probably be made a bit more generic instead. Instead of a monolithic BFM for a chip, I could create a generic SPI BFM (which I'm sure would come in handy in other cases) and try out an idea I've been having about a memory that uses VPI callbacks to store data in dynamically created C structures instead.

This is where the scope creep started...

First of all, the idea of having a dynamically created memory instead of a huge twodimensional array comes from the assumption that in some cases, we need a large memory map, but we only read and write to the small parts of that memory. Having a huge static verilog array means we have to allocate a lot of memory that will never be used, and the simulator might need to check for events on every bit, which could slow things down considerably. (Note here that I'm using might and could, since I haven't done any benchmarks to prove anything yet. It could also be very simulator dependent). So the idea here is to make an array of pointers to memory segments and only malloc a page when it is written to for the first time. The verilog code would have $read and $write functions that can read from memory and some init functions to preload images (elf, exo, bin, vmem, srec and whatever could be useful).

I started hacking on the C code, and it seemed reasonably simple to do this. To try it out, I wanted some real world testing, but unfortunately I didn't have any nice SPI test cases (nor a SPI BFM). The VPI memory backend however would be a good match for a simple model of a Wishbone SDRAM controller . If I could replace the existing controller (Scope Creep), I would be able to run all the existing ORPSoCv2 test cases to get some verification. Looking at the code for the existing model, it turned out to be a little awkward to fit the $read and $write commands in to the existing Wishbone memory model, so I decided to find a proper Wishbone BFM that I could use to interface with the memory backend instead (Scope Creep).

Searching around the internet I found a few commercial ones, a nice looking one in System Verilog, a VHDL version and an implementation in the ethmac testbench (an IP that I also happen to be co-maintainer of).

None of the existing models were up to the task however as I need it to be pure verilog and support Wishbone B3 burst transactions. The first requirement is because neither Icarus Verilog or the free Altera-provided Modelsim supports mixed language simulation, and everything I have is in verilog at the moment. Only the BFM from ethmac fulfilled the first requirement, but unfortunately it didn't seem to support burst transactions. The only thing left to do is to write a brand new verilog based Wishbone slave BFM myself (Scope Creep). I figured that would come in handy anyway, so it wasn't that much of a problem.

Implementing a simple BFM wasn't too much work, but it turns out that having a three weeks old baby and a full-time employement makes it a bit harder to find spare time to work on side projects like this. It also took a while longer since I at some point decided to refactor some things in ORPSoCv3 (Scope Creep), make a Wishbone interconnect generator to make it easier to hook up the memory model to a CPU (Score Creep), and to test the inteconnect I would need to build a Wishbone Master BFM too (Scope Creep). The interconnect part was eventually dropped, but I did implement a basic Wishbone Master BFM. I also decided to start with a static memory before I tried out the VPI backed memory model.

Having come this far I thought it would be fun to tell the world about my new BFM once it was done, show of the capabilities of ORPSoCv3 and provide some cool benchmarks, and since I'm about ten years behind the rest of the world, I decided it was now time to start my first blog (Major Scope Creep!). Starting a new blog leads to some big decisions. I would either need to use an available web service to host my blog or do it myself. I decided to host it on my NAS since there is already a Wordpress package available for it. After installing the package and fiddling with MySQL permissions (Scope Creep) I realized that I needed a name for the blog, so I took a few days to come up with a number of witty names (Scope Creep), and abandoning them one after one as it turns out that all witty names for a FPGA/Hardware/Open Source/electronics blogs are already taken (stupid internet!).

About this time I also realized that I didn't really want to host my own blog since it probably meant that I had to tighten up the security of my home network and think a bit about how to provide decent availability, so the focus shifted to finding a good platform to use instead. This left me again with several choices, and the only thing to do was to read up on all the pros and cons of available services (Scope Creep). I started registering an account on wordpress, but had to stop when I realized that I still didn't have a name for the blog.

By now I was starting to get really tired of my own inability to finalize things, so once I came up with a name for the blog, I registered a blogspot account instead and started writing. The basic functionality for the BFM is mostly finished now, but I to avoid further delays in putting this first post out, I'll finish it and push it to the ORPSoCv3 repo after I'm done writing. The slave seems to have identical timing to the model I'm replacing, so I'll consider that good enough for now.

I'm fully aware that this first post need a lot of background informaton to make sense for people outside of the inner-most circle in the OpenRISC project, and that I should have written introductions about myself, the OpenRISC project, ORPSoC, Wishbone, BFM:s Open Source Hardware and whatever else I've benn talking about. Well, to relate to the title, I had to stop this scope creep before I totally lost control. So to finish things up and provide pointers about what I might write about in the coming articles, here's a summary of the work that led up to this article.

Initial target: Add an SPI memory model to ORPSoCv3
Work started: VPI backend for a dynamic memory model, Test bench for Wishbone interconnect, BFM-based static memory model
Work done: Basic Wishbone Master/Slave BFM, Choose a blog platform, Write a first post
Work planned: SPI master/slave BFM, Wishbone Interconnect Generator, Finalize, document and release Wishbone BFM, continue working on rough edges in ORPSoCv3
Work dropped: Hosting a blog, secure the home network, integrate existing Wishbone BFM

As you can see, I'm nowhere near starting on the SPI flash memory model, but at least I have a blog and a partially working Wishbone BFM! :)