Getting stringi to work with RStudio Connect

If you’re working with R in a Red Hat Linux based environment with no access to the internet and need to install the stringi package you’ll no doubt run into issues with ICU library that it relies on.

At the time of writing, stringi (1.2.4) relies on libicu (called ‘libicu-devel’ on Red Hat based systems). Red Hat 7.5 only has libicu-devel version 50 and stringi needs 52 or higher. During installation, if it can’t find the version it needs, it tries to download a version it can compile against from the internet. It goes on to try further addresses if the first fails.

Whilst I fully understand this behaviour – it’s great in a low security research environment, or on a desktop system for example – I’m not overly keen on it in production environments. The vast majority of production R environments are not permitted to access the internet for security reasons, so a standard installation of stringi will fail.

Fortunately the author provides many options for installation that get round this. These are all good options and in situations where you have control over the installation process it’s entirely possible to use one of these.

When working with RStudio Connect however, you have no control over the installation process – that’s handled by packrat and Connect – so you don’t have the option to directly tweak the way the installation is performed. The good news is that there is a way to get it working and once you figure it out it’s actually reasonably simple (though figuring it out often takes me a while, then I forget, then I have to do it again - rinse - repeat).

It’s also worth remembering that Connect doesn’t rely on a central package library location like most things in the R world, packages are installed on a per-app basis. This means your Connect system may end up installing stringi many times as part of the deployment of the different apps you’re hosting. This means we need a stable solution that we can just set up and then forget about.

So how do you get it to work?

Since – in our internet-less environments – downloading the required files on the fly is not possible, we must do that manually ourselves. The required download for Intel based systems is here. Download this file (or check the install.R file in the package source if you’re reluctant to trust some random person on the internet) and transfer it to the server you’re working on. I like to keep it in /opt/icu, just so I know where it is.

Next, we need a way to tell the stringi package that we already have this file and not to try to download it itself during installation. Fortunately stringi has a simple mechanism for this, we can set an environment variable that provides the path to the file.

The best place to do this is your Renviron file. If you’re using the EPEL version of R you can find this at /usr/lib64/R/etc/Renviron.

You’ll need to edit your Renviron file and add the following line:

ICUDT_DIR=/opt/icu

Once that’s done, the next time Connect tries to install stringi, instead of trying to download the version of ICU that it needs it will use the environment variable to discover the ICU zip file that we downloaded earlier.

Installation should now proceed as expected and hopefully you won’t have to revisit this configuration until stringi depends on a newer version of the library.