Learn to Write Command Line Utilities in R - part 5

Check out the first post in this series for an index of all the other posts.

Last time, we changed the way our sorting hat command line utility did its sorting. We moved away from random assignment and implemented a simple binning system for our input names. While we fixed a major problem with our utility, we didn’t really learn much about writing command line tools, so in this post, we’ll look at implementing some debug logging to let us know what’s happening inside our app while it’s running.

To do this, we can use a second argument. We already have the name argument, args[1], so we could easily set something similar up for args[2].

If we’re going to use args[2] as a sort of switch to turn our debug output on or off, it will be easiest if we can can do something that gives us a TRUE/FALSE logical to use.

The following code will set a variable, app_debug, to either TRUE or FALSE depending on the presence of the word ‘debug’ in the args[2] position.

if (identical(args[2], "debug")){
  app_debug <- TRUE
} else {
  app_debug <- FALSE
}

This means that app_debug will default to FALSE, but will switch to TRUE if args[2] is set to ‘debug’. We then need to use this to print out our debug information. We could do logging with a full blown logging package like futile.logger, but for this small application a quick custom implementation should be more than good enough.

debug_msg <- function(...){
  if (isTRUE(app_debug)){
    cat(paste0("DEBUG: ",...,"\n"))
  }
}

This function is a little wrapper for cat() that only runs if app_debug is TRUE. We can use this new function to put additional messages anywhere that makes sense for our specific application.

debug_msg("This is a debug message")

These messages should provide further information about what’s going on inside our utility. These messages can then be used to either fine tune the script, or to understand what’s happening when something goes wrong. This is great to have when you’re developing and testing utilities, but also useful for debugging issues in production, in the event that problems arise down the road.

Let’s incorporate this into sortinghat.R, so that it looks something like the following:

#!/usr/bin/env Rscript --vanilla
args <- commandArgs(trailingOnly = TRUE)
# check if debug is set
if (identical(args[2], "debug")){
  app_debug <- TRUE
} else {
  app_debug <- FALSE
}
# function to display debug output
debug_msg <- function(...){
  if (isTRUE(app_debug)){
    cat(paste0("DEBUG: ",...,"\n"))
  }
}
debug_msg("Debug option is set")
if (length(args) < 1){
  stop("I think you forgot your name\n")
}
your_name <- args[1]
debug_msg("Your name is - ", your_name)
houses <- c("0" = "Hufflepuff",
            "1" = "Gryffindor",
            "2" = "Ravenclaw",
            "3" = "Slytherin",
            "4" = "Hufflepuff",
            "5" = "Gryffindor",
            "6" = "Ravenclaw",
            "7" = "Slytherin",
            "8" = "Hufflepuff",
            "9" = "Gryffindor",
            "a" = "Ravenclaw",
            "b" = "Slytherin",
            "c" = "Hufflepuff",
            "d" = "Gryffindor",
            "e" = "Ravenclaw",
            "f" = "Slytherin"
            )
name_hash <- digest::sha1(tolower(your_name))
debug_msg("The name_hash is - ", name_hash)
house_index <- substr(name_hash, 1, 1)
debug_msg("The house_index is - ", house_index)
house <- houses[house_index]
cat(paste0("Hello ", your_name, ", you can join ", house, "\n"))

Running our command line utility

MacOS/Linux/git-bash

First with no arguments:

$ ./sortinghat.R
Error: I think you forgot your name
Execution halted

Then with just a name:

$ ./sortinghat.R sellorm
Hello sellorm, you can join Ravenclaw

And finally, with the debug option set:

$ ./sortinghat.R sellorm debug
DEBUG: Debug option is set
DEBUG: Your name is - sellorm
DEBUG: The name_hash is - e9d883753fd4742672f4e7df6b93d367640e33bf
DEBUG: The house_index is - e
Hello sellorm, you can join Ravenclaw

Remember to type everything after the ‘$’ symbol and feel free to replace ‘sellorm’ with a name of your choosing. You should see output similar to that displayed above.

Windows

Don’t forget, if you’re using git-bash (see the first article for more info), you need to follow the instructions for Linux/MacOS.

First with no arguments:

sortinghat
Error: I think you forgot your name
Execution halted

Then with just a name:

sortinghat sellorm
Hello sellorm, you can join Ravenclaw

And finally, with the debug option set:

sortinghat sellorm debug
DEBUG: Debug option is set
DEBUG: args[1] is - sellorm
DEBUG: Your name is - sellorm
DEBUG: The name_hash is - e9d883753fd4742672f4e7df6b93d367640e33bf
DEBUG: The house_index is - e
Hello sellorm, you can join Ravenclaw

Feel free to replace ‘sellorm’ with a name of your choosing. You should see output similar to that displayed above.

Wrapping up

Now that we have debug output, it’s much easier to see which house index you need to change to please a difficult user! In the example above, the name ‘sellorm’, gives us a house index of ’e’, so if I wasn’t happy with Ravenclaw, I know exactly what I need to go in and change.

One interesting side effect of having an optional args[2], is the impact that might have on any future args[3] we might like to implement. Since args[2] is optional, we can’t properly implement any further arguments using the approach we’ve taken.

Consider the following:

$ ./cliapp.R value1 value2 value3

This gives us the following argument values:

args[1] = value1
args[2] = value2
args[3] = value3

But since args[2] is optional, we could end up with this:

$ ./cliapp.R value1 value3

This gives us the following argument values:

args[1] = value1
args[2] = value3

Now ‘value3’ is in the wrong place. To work around this we’d probably have to to implement some code that could check the values in all positions and figure out what the intent of the input was, rather than just working directly off that input.

Fortunately, there’s a much better way forward. There are several R packages that implement argument parsing logic in ways that make it easier both for us as writers of these sort of tools, as well as for our end users. If you’ve used command line tools like these before, you’ve probably noticed that many have options that look like ‘-h’, or ‘–help’, which is often used for getting help on a particular utility, and these packages can help with that too.

In the next post, we’ll look at reworking our script to use one of these packages and we’ll start to make our Sorting Hat utility look and feel more like a real command line tool.

Friday, December 22, 2017