Intro to APIs For Data Science

Back in 2023, I was honoured to be invited to give one of the keynote addresses at the Insurance Data Science conference. It’s a great conference, and I thoroughly recommend it if you’re in the field. My keynote has only ever been shared through the conference website, so I thought I’d post it here too, in case it’s useful to anyone. I always try to make my slides as visual as possible, with as little text on them as I can.

Designing 3D prints with R

I’m a big fan of the open-source 3D modelling software, OpenSCAD. In case you’re not familiar with it, it’s a tool for creating 3D CAD models. Unlike most tools in this space though, it’s script based and uses an interpreter to translate your model definition script to a 3D model file. For example, to create a small sphere you could do: sphere(20); Which would generate this sphere: Anyway, I like OpenSCAD a lot, though it’s not without its frustrations.

How R4Pi gets built

When I started the R4PI project to deliver up-to-date versions of R and a number of popular packages for the Raspberry Pi family of computers four and half years ago, it was just a collection of R and BASH scripts. Over the years, it’s evolved and grown and whilst the individual software components do a decent enough job of documenting themselves for me, it’s not easy for anyone else to peak behind the curtain and see how it all fits together, so this is my attempt at “documenting” the pipeline in the hopes that it might be interesting or useful to someone.

The "nhsnumber" package and the joy of sharing your niche

This post originally appeared on the NHS-R blog. Being the author of a package with tens of thousands of users must be incredibly rewarding. All those people getting value from your work and using it to do incredible things. Few of us will ever write a package that has that kind of reach though. Most of us must be content to give back to our communities in smaller ways. In 2019 I was working for a company building software for Genomics England and the NHS.

What are we optimising for?

Optimisation, choice and the art of data science We data folk are always optimising for something. Our code, the return on an investment, warehouse stock levels, drug dosages, whatever. In many cases though, optimisation isn’t just a singular thing that can be easily arrived at. Take Google Maps for instance. Maps will provide me with directions to a specific destination using the most optimal [fastest] route. But is that what we always want, the fastest route?

5 Tips for Using pins with R

It’s no secret that I’m a big fan of the pins package for R (and now there’s Python pins too!). In this post, we’ll take a look at my top 5 best practice tips for using pins effectively. Finally, we have a bonus tip on dubugging problems using pins with RStudio Connect. 1. Use a good title A good title is an essential part of discovery for your pins. It should be short, but informative.

Getting started with logging in R

TL;DR logging is an extremely useful tool for understanding a running (and potentially failing!) application and an essential element of running any code in production. Adding logging to your long running script, shiny app or plumber API is simple and can pay off enormously when things go wrong or when you want someone else to look after your code for you. Log…. Rolls down stairs, Alone or in pairs, Rolls over your neighbours dog.

Running a shiny app in a docker container

I recently went looking for a tutorial on hosting a shiny app inside a docker container for a friend. There are a loads of tutorials available, but this one from Juan Orduz is my favourite. It’s short, to the point, and covers exactly what you need to get started at the perfect level of detail. It’s a couple of years old now though and there are a few ways we can tweak it to make it a little more robust.

Thinking about your career

The three key facets of any career are hopefully pretty self explanatory: Things you’re good at Things you like doing Things you can get paid for If you can find a career that exists at the intersection of all three I think you’ll be pretty happy. Life’s rarely that simple though, so let’s take a look at the two-set intersections to see where a career can get more nuanced.

Product maturity curve in R

I’ve been thinking a lot lately about product maturity and how it applies to open-source projects. The product maturity curve shown above is often used in commercial product discussions to help people think about the product lifecycle. To be honest, I have some pretty strong misgivings about it, but it can be helpful sometimes and I was thinking about recreating it in R (as is my way) and so here’s the code to create the plot above.