Learn to Write Command Line Utilities in R - part 6

Check out the first post in this series for an index of all the other posts.

In the first parts of this series we explored how to create simple command line utilities in R. In part 5 we talked about how difficult it is to go beyond a couple of parameters. This part will remove that limitation and allow us to take our command line apps to a whole new level, delivering the look and feel of existing command line utilities.

Command line tools have been around since the earliest computer operating systems and a lot of the common tools in use to day have their origins in tools developed for Unix systems 40 years ago. As such several conventions have emerged over the years. One such convention is that arguments to command line tools (often called ‘options’) are most often specified in either a long or short form. Consider for example, the humble ls command; this command has a great many options, but one of the most commonly used is -a. This option specifies that you’d like to see all output, including hidden files. This option is actually the short form of the --all option. The convention here is that options usually have a long (more verbose) form, as well as a short form. Further, the long form is generally a full word and is preceded by two dashes, --, whereas short form options usually a single letter preceded by a single dash, -.

These rules generally only apply to Unix derived operating systems like MacOS and Linux. Windows has no real conventions, with Microsoft developed tools using a / as the option prefix, and third party tools either following the Unix convention above, or just using a single - for both long and short form options. Even on the Unix derived OSes the convention is sometimes ignored, for example, the find and openssl command line utilities use a single dash for both long and short form options, but this is the exception, rather than the rule.

It often also makes no difference in which order these options are specified. For example, ls -a -R is exactly the same as ls -R -a. If we were to attempt to implement this using the commandArgs() approach that we’ve used thus far, we’d have to write a lot of boilerplate code for option parsing that could then provide us the options in a usable format. Fortunately, we’re not the first to tread this path and there are already several great option parsing packages we can use.

Numerous package options exist in this space: argparse, optparse, docopt, argparser and probably many more. Naturally, each has its own approach to solving the problems outlined above and in the previous article and the reader is encouraged to check them out and find one that suits them best. My preference is for argparser, as I like the way the API is implemented, and it has no external dependencies, which in turn simplifies deployment. If I’m using R-based command line utilities to bootstrap a compute environment, the fact that argparser is written purely in R and has no external dependencies really makes a difference. We’ll therefore be using argparser for the rest of this series.

Let’s step through the use of argparser in our script. Compared to the previous version, there’s slightly more work up-front, but the pay-offs are numerous.

Firstly we set up a new argument parser, and give it a name:

p <- arg_parser("The Sorting Hat")

Then we can add our name argument. This argument is mandatory, so we haven’t specified a default. This argument is required for the proper operation of the script, and is referred to as a ‘positional’ argument. We can also specify help text here.

p <- add_argument(p, "name", help="Name of the person to sort")

Our next argument is a flag. This is a optional argument, that can be used to turn functionality within our script on and off. In this example, we’ll use it to switch our debug mode on and off. If this option is not supplied, the value of debug will be FALSE.

p <- add_argument(p, "--debug", help="enable debug mode", flag=TRUE)

We’ll add another flag argument at this point, which we’ll use later.

p <- add_argument(p, "--short", help="output only the house", flag=TRUE)

Finally, we parse the arguments:

argv <- parse_args(p)

The parse_args() function, returns a list containing all of the arguments that we defined in the section above and we can then use those to control the behaviour of our script. For example, argv$debug will contain the either TRUE or FALSE, depending on whether the option is specified or not.

We’ll incorporate these changes in a moment, but first we need to handle that --short option that we set up earlier. We’re going to use that to change the output. If it’s FALSE (the default) we want to output the full message that we’ve seen previously. If the option is set, and therefore TRUE, we’re going to just output the house name with no other text.

if ( isTRUE(argv$short)){
  cat(house,"\n")
} else {
  cat(paste0("Hello ", argv$name, ", you can join ", house, "\n"))
}

Now that we’re handling all our different potential inputs, let’s update our script to look like the following and then we can start to check out the new behaviour. Note that this version of the script also removes the check for the first argument that we were forced to implement in an earlier version.

#!/usr/bin/env Rscript --vanilla
library(methods)
library(argparser)

p <- arg_parser("The Sorting Hat")
# Add a positional argument
p <- add_argument(p, "name", help="Name of the person to sort")
# Add a flag
p <- add_argument(p, "--debug", help="enable debug mode", flag=TRUE)
# Add another flag
p <- add_argument(p, "--short", help="output only the house", flag=TRUE)
argv <- parse_args(p)

# function to display debug output
debug_msg <- function(...){
  if (isTRUE(argv$debug)){
    cat(paste0("DEBUG: ",...,"\n"))
  }
}

debug_msg("Debug option is set")

debug_msg("Your name is - ", argv$name)

houses <- c("0" = "Hufflepuff",
            "1" = "Gryffindor",
            "2" = "Ravenclaw",
            "3" = "Slytherin",
            "4" = "Hufflepuff",
            "5" = "Gryffindor",
            "6" = "Ravenclaw",
            "7" = "Slytherin",
            "8" = "Hufflepuff",
            "9" = "Gryffindor",
            "a" = "Ravenclaw",
            "b" = "Slytherin",
            "c" = "Hufflepuff",
            "d" = "Gryffindor",
            "e" = "Ravenclaw",
            "f" = "Slytherin"
            )

name_hash <- digest::sha1(tolower(argv$name))

debug_msg("The name_hash is - ", name_hash)

house_index <- substr(name_hash, 1, 1)

debug_msg("The house_index is - ", house_index)

house <- houses[house_index]

if ( isTRUE(argv$short)){
  cat(house,"\n")
} else {
  cat(paste0("Hello ", argv$name, ", you can join ", house, "\n"))
}

Running our command line utility

Windows

Don’t forget, if you’re using git-bash (see the first article for more info), you need to follow the instructions for Linux/MacOS.

To avoid duplication, please follow the same steps as for MacOS/Linux, but remember to replace ./sortinghat.R for sortinghat.

MacOS/Linux/git-bash

Remember to type everything after the ‘$’ symbol and feel free to replace ‘sellorm’ with a name of your choosing. You should see output similar to that displayed below.

First with no arguments:

$ ./sortinghat.R
usage: ./sortinghat.R [--] [--help] [--debug] [--short] [--opts OPTS] name

The Sorting Hat

positional arguments:
  name                  Name of the person to sort

flags:
  -h, --help                    show this help message and exit
  -d, --debug                   enable debug mode
  -s, --short                   output only the house

optional arguments:
  -x, --opts OPTS                       RDS file containing argument values
Error in parse_args(p) :
  Missing required arguments: expecting 1 values but got ().
Execution halted

Wow, look at that! That looks exactly like the sort of output you’d expect from a ‘proper’ command line utility. If we break this down a little, we can see that the argparser library has taken the arguments we told it about and constructed this handy help information. Also notice that even though we specified --debug, argparser automatically provides a ‘short’ version, -d and it’s done the same with our --short flag too. It’s even added a --help and -h, which also displays this same information. It’s provided a ‘usage’ section at the top and used the name we provided as a description too.

The -x or --opts option is a interesting default as this allows us to specify an RDS file containing the argument values, instead of supplying them on the command line. This is useful if the script will be run many times, or during testing.

Finally, we see the error message, which helpfully tells us that it was expecting 1 value, but we didn’t specify it. Our command line tool is now starting to behave much more like a properly implemented tool. Let’s take a look at how it works when we specify options.

First with just a name:

$ ./sortinghat.R sellorm
Hello sellorm, you can join Ravenclaw

As we’d expect, this works exactly as it did before we introduced argparser.

Next we can try with the debug option set:

$ ./sortinghat.R -d sellorm
DEBUG: Debug option is set
DEBUG: Your name is - sellorm
DEBUG: The name_hash is - e9d883753fd4742672f4e7df6b93d367640e33bf
DEBUG: The house_index is - e
Hello sellorm, you can join Ravenclaw

The output here is unchanged from the previous version, but the way that we obtain it, using either -d or --debug make out utility feel much more like a first class command line citizen.

Let’s try the short option next:

$ ./sortinghat.R --short sellorm
Ravenclaw

This provides the expected shorter output.

Now let’s try combining the two.

$ ./sortinghat.R --debug -s sellorm
DEBUG: Debug option is set
DEBUG: Your name is - sellorm
DEBUG: The name_hash is - e9d883753fd4742672f4e7df6b93d367640e33bf
DEBUG: The house_index is - e
Ravenclaw

And what about the order of the arguments?

$ ./sortinghat.R sellorm -s -d
DEBUG: Debug option is set
DEBUG: Your name is - sellorm
DEBUG: The name_hash is - e9d883753fd4742672f4e7df6b93d367640e33bf
DEBUG: The house_index is - e
Ravenclaw

Well, the good news is that argparser handles that for you too!

Wrapping up

In this instalment we’ve introduced the argparser library as a simple way to quickly improve our script’s look and feel. This move has resulted in a script that feels like a much more natural fit alongside it’s command line counterparts. We’ve implemented long and short options, help output, and added an option for shortening the output.

Hopefully you’ll agree that not only does out utility feel more professional now, but it’s also easier to use.

I’d encourage you to check out the argparser documentation, and also have a look at the other libraries mentioned above.

In the next instalment we’ll add another flag and come up with some ways that we can improve our output further.

Saturday, December 30, 2017