I am not a Data Scientist - My R journey

Today is the fifth anniversary of my joining Data science consultancy, Mango Solutions. That also means it’s my fifth anniversary of using and working with R.

My background, when I joined mango was in Unix system administration and database administration, but I’d also worked on some ETL projects and had experience of running large compute clusters. I joined Mango as part of the IT team and was quickly set to work on my first big project: Integrating RStudio Server into a pharmacokinetics/pharmacodynamics (PK/PD) platform for a large pharmaceutical company. This was also the project where I shipped my first R code, a little if statement that wrapped some shell functionality.

Since that first project I’ve worked on numerous others, from consulting on how best to integrate R into large businesses and government organisations, to building large scale API deployments with R, to installing and configuring RStudio Connect. I’ve also spoken about running R in production and helped run workshops and training sessions in Europe and North America.

For the last 5 years I’ve built my career largely on R, but I’m definitely not a Data Scientist. I don’t even have any real statistical knowledge and rarely make a chart. So what on earth am I doing? Even the R-Project website says R is for ‘statistical computing’, which is obviously not what I do, so what have I been doing all this time?

The very first time I heard about R was about six months before I started working at Mango. It sounded interesting, so I did what I often do when I hear about interesting tools: I immediately installed it. I remember starting it up for the first time and being confronted with the prompt.

>

With an unfamiliar language the prompt just sits there, daring you to type something. But, after staring at it for a moment I realised I had nothing and crashed my way out.

Fast forward six months and I’ve joined Mango, but I’m an infrastructure person, not a developer, and certainly not an R developer. I just like to make myself useful, and help people out as best I can.

Unsurprisingly for someone with my background, it started with the infrastructure. Projects like getting RStudio Server integrated with that PK/PD platform and setting up HPC grids for R and other tools. Through these projects and others like them, I learned about running R in corporate environments and at scale.

No programming language exists in a vacuum and R is no exception. So I began to learn about the ecosystem around R and ways in which it could be integrated with other things. The more I worked on these projects, and the more I worked alongside Mango’s exceptionally talented Data Science consulting team, the more I picked up. After a while I came to really enjoy working with the language. Through my work and the conferences I’ve attended, I’ve met lots of amazing, smart R users from around the world. Their backgrounds, challenges, and perspectives are extremely diverse, but they’re all working on interesting problems using some of the best, most advanced tools available today.

With R more than most other languages there is a true sense of community and despite not having a statistical background, the people I meet are always enthusiastic, welcoming and happy to share. In many ways, I’m an outsider looking in. Unable to participate in the conversations on statistical topics, but it’s never felt like too much of a hindrance. Over the years, I’ve learnt much from these wonderful people and I’ve found ways to use what I’ve learnt to contribute a little back.

There are people in the R community that are very public, and those that are not. There are people doing pure stats work, and those that are not. Despite its statistical roots, R is increasingly general purpose and it’s easier than ever to make yourself at home in a community like no other.

If you’re on the periphery of the R community, or you use R in uncommon ways – fear not – you’re not alone! My stats knowledge is worse than terrible and I sometimes (but not often!) find R to be more maddening than magical. But I’m here and I’ve spent 5 years sinking deeper and deeper into the wonderful quicksand that is the world of R and I have no plans to dig myself out any time soon.