How R4Pi gets built

When I started the R4PI project to deliver up-to-date versions of R and a number of popular packages for the Raspberry Pi family of computers four and half years ago, it was just a collection of R and BASH scripts. Over the years, it’s evolved and grown and whilst the individual software components do a decent enough job of documenting themselves for me, it’s not easy for anyone else to peak behind the curtain and see how it all fits together, so this is my attempt at “documenting” the pipeline in the hopes that it might be interesting or useful to someone.

Build systems

We currently have four build servers. One each for:

Ubuntu Noble Numbat - Raspberry Pi 5
Raspberry Pi OS Bookworm - Raspberry Pi 4
Raspberry Pi OS Bullseye - Raspberry Pi 4
Raspberry Pi OS Bullseye 32 bit - Raspberry Pi 4
- This is the only 32 bit build system we currently operate

These hosts live in my home-office and are used more-or-less exclusively for R4Pi builds. They’re all running 24/7.

Two of these hosts are running from SSDs over USB rather than a micro-SD card as the package builds have destroyed a few SD cards over the years with repeated and frequent write operations. Since the Pi 5 has PCI, that could have a nice fast NVME drive, but then it wouldn’t fit in the case.

To be honest, that’s the biggest problem with this set-up. The servers are a real mess in terms of their physical presence in the office. I’d love someday to have them all in a nice, single case, with a single power supply instead of the mess or wires and hardware we have now. That’s a project for another day though.

Server management

The four servers are maintained, updated and modified using Ansible. I was doing it manually for a good while before long-time R4Pi collaborator Andrés suggested I use Ansible and I used his existing Raspberry Pi Server Project as a template for my own implementation. This was in no small part precipitated by a catastrophic server failure caused by an SD card failing in one of the servers. There were backups and everything was safe, but I’d been hand-rolling some janky maintenance scripts that only managed to get the server rebuild half-way.

Anyway, we’ve had a few server failures since then - a couple more SD card corruptions and we even had a Pi die on us – but Ansible allows us to effectively declare the entire configuration for all the servers up front so that a re-build, or new build in the case of adding support for a new OS, is pretty straightforward these days.

Naturally there’s always room for improvement. For example, I didn’t understand Ansible tags when I first implemented this and as such, I’m not really making use of them, despite the obvious benefits they’d provide in terms of allowing me to apply portions of the Ansible config instead of all of it every time.

When we added support for Ubuntu earlier this year, it was mostly a case of modifying all the Ansible tasks to support the new OS. Ubuntu, like Raspberry Pi OS, is a Debian Linux derivative, so we already had 90% of everything we needed. It was then a case of tweaking things in that last 10% where it’s Ubuntu-specific to get us up and running with full Ubuntu support.

R builds

Building R itself is fairly straightforward on a Pi. You first make sure you have all the build dependencies installed, which is handled by Ansible as mentioned above. All the dependencies for the core language are available on the machines as soon as each server build process is complete.

The builds are then a three stage process with specific tooling for each stage:

Build and install R on the build server
Add any OS-specific patches
- Currently this is only adding a default CRAN-like repository specific to the OS (more on this later)
Package up R into a standard “.deb” file for easy installation.

This last step is needed to make the installation work properly on everyone else’s Pi’s since the .deb file also contains a list of all the run-time dependencies required to run on another system so they get installed when our deb file is installed.

While each step is scripted, it’s still a manual process to run those scripts. We only have to do it 8 times a year - 2 R version updates a year on 4 different OSes - so it’s not too bad, and ensures we keep an eye on the process for problems.

As part of this process we also have a Python script that checks the entire R build for common issues just prior to packaging. We needed this to avoid things like that one time I published a package to the entire world that had a bunch of file permission errors and broke everyone’s installations. The issue was quickly identified and a new build was pushed out, but it’s the sort of thing that’s easily avoided with a bit of system testing, so that was implemented as soon as the fixed R build was pushed out. Errors like this are, in a sense, quite trivial but it’s also the sort of thing that utterly breaks everything for your users so it’s important to handle updates with care. Certainly more care than I had been!

R distribution

Once the build is complete the .deb file is transferred to one of the servers that handles building and publishing the deb repository. A deb repository (or “apt repository”) is the place that a Debian based OS gets their software and software updates.

The R4Pi project has had one from the start and we’re now running 3 to cover all the different OS versions that we support. That’s one for each OS because a single repo can have both 32 and 64 bit packages in it in the case of our old bullseye repo.

The R deb file is added to the local repository and then we sync the entire repository to Amazon Web Services S3 storage product.

S3 lets us host static websites for cheap, which is ideal for this sort of public deb package repository. Once the files are transferred into S3, they’re then available to users who’ve configured their OS to use our repo so they can do things like sudo apt install r4pi to install the R4Pi R build, or sudo apt upgrade && sudo apt upgrade -y to update it.

Now that we can install R, we’re probably going to need some packages.

Package builds

Package builds are an order of magnitude more complex than building R.

For those not aware, when installing a package with install.packages() from CRAN, users are provided binaries for Windows and Mac, but those on Linux are provided with the source code for the package. This makes sense, since it would be a substantial undertaking to provide binaries across all the many different flavours of Linux that exist. Fortunately, R understands how to build a package from source during installation and generally, the package will be built and installed on the target system with no issues.

However, source-based package installation has a few pitfalls:

Packages can require system-dependencies that are not necessarily installed on the target system. Missing system-dependencies will cause the package installation to fail. For example, if you’re installing a package that depends on your system having a specific XML library that’s not installed, package installation will fail until this dependency problem is identified and fixed.
Packages sometimes require substantial resources to build and therefore might be difficult or impossible to build on lower spec devices. While it is possible to buy Raspberry Pi computers that have 8GB of RAM these days, that’s the top-of-line system and therefore also the most expensive, so not everyone will have one.
Building packages from source can be extremely time consuming and break the flow of the user. If someone fires up R to write some code and is immediately hit with a 5 minute wait while some package they want to use is compiled and installed, they could be completely out of the right mindset by the time it’s done. In a new installation installing seemingly few common packages can take tens of minutes, sometimes more as all the packages and those packages dependencies and so on are installed.

Our solution to these problems, and our way of steering users towards the pit of success, is to build packages ourselves and publish the binary versions in a custom CRAN-like repository for each OS that we support. This gives R4Pi users an easy on-ramp into running all the most common types of packages on the Raspberry Pi without having to worry too much about build dependencies or resource constraints at build time.

We currently build just over 1000 packages for each OS version as well as tracking updates to those packages and publishing those as they’re released.

Our most complex tooling is in this area. We have a series of tools that monitor CRAN for updates and when an update is found for a package we’re tracking, we download the update and any new dependencies, build these packages and then publish them to a CRAN-like repository for the relevant OS.

This is all managed using a timed trigger with cron. Four times a day, a script that checks CRAN’s PACKAGES file is launched. The PACKAGES file is a single file “index” of all the packages in CRAN. It’s what R uses to report what packages or updates are available to the user.

We check this list against the list of packages we’re tracking as well as the packages that we already have installed. When an updated package version is detected, it’s built and published. This process is a combination of BASH, Python and R scripts. Most of the configuration code is Python, the package building and manipulation is R and there’s BASH to glue it all together.

Finally, all package builds are re-triggered when a new version of R is used, so that available packages have been built with the current version of R. The idea here is that a change in R may introduce a change in the way the package is built so it’s more of a safety measure than anything, but we believe CRAN do this, so we do it too. Naturally, this means building over 4000 packages on 4 tiny little Raspberry Pi computers twice a year when R is updated. This process alone takes well over 24 hours.

Package distribution

Once packages are built, they’re moved into our local package repository. This is essentially the master version of what eventually gets published as our CRAN-like repository.

Luckily, R includes everything you need to create a CRAN-like repository out-of-the-box. So, once the built package files are moved into the appropriate location, we use R to generate or update our own PACKAGES file. At this point, we also use R to generate a web page listing all of the packages we have available for that OS, along with some useful links and so on.

Now that we have a complete CRAN-like repository locally, we again sync that local repository with another AWS S3 backed website, so that it’s publicly available to our users. You can see it here. This site is distributed globally using Amazon’s CloudFront Content Delivery Network (CDN) to make sure people in other territories get decent download speeds wherever they are in the world.

Remember, all this is done 4 times a day by each of the 4 different build servers. Automation is good like that. We’ve been able to take what is a 13 step process and reliably and repeatably run it over 5000 times a year.

For reference, our 13 steps are:

Check we’re not already running a build - we don’t want two running at once
Generate the list of packages to be installed for the current OS
- We have to maintain different lists because some packages have issues on certain OSes or system architectures.
Check that everything is set up for the package build to proceed - input and output directories exist etc.
Check if the packages were built with an older version of R - rebuild them all if they were
Update installed packages and install any new packages that have been added to the list
- This ensures we have all the required package dependencies when we come to build the packages for distribution later
generate a list of all of the packages and their dependencies
Download any new source packages required from CRAN
Build the package binaries and move them to the proper output directory
Write the updated PACKAGES file
Write the HTML we need for the list of packages in our repository
Sync the updated repository with S3 so that it’s available publicly
Invalidate the CloudFront cache so that the newly updated files are available
Finally a status update is sent using Pushover so that I get alerts on my phone when builds are complete.

Summary

As you can see, even maintaining a small project like R4Pi can be quite complex behind the scenes and we’ve not even talked about the website (which is available in English and Spanish). What we’ve discussed above is more-or-less the state of play with the project today. We’re constantly tweaking things and making improvements, so things are changing all the time.

We know there are lots of improvements left to make and hopefully one day we’ll even manage to speed things up enough to get full CRAN coverage. Remember though, the project is entirely self-funded and development is also crammed into the space around day jobs and family commitments and so on, so progress is not always as quick as we’d like. In the meantime, if you’d like to try R4Pi on your Raspberry Pi, head on over to the website and if you’d like to see any of the pipeline tooling, you’ll find it all on GitHub.

Monday, August 12, 2024