wiki:CommandLineProcessing

The current convention for running the R scripts during a job run on a cluster is to invoke the script like this:

R CMD BATCH scriptname.R

Where R is the R interpreter and CMD indicates that an R tool will be used. The general syntax is R CMD command args. The BATCH command supports non-interactive execution of R scripts. This command essentially says "invoke scriptname.R non-interactively".

The R command line syntax and options are nicely documented in the  Invoking-R appendix in the R Introduction manual.

Reading Command Line Arguments

In order to explore the command line invocation of R, we can use two simple R commands that just print the arguments passed to R during it's invocation.

args = commandArgs()
print(args)
q()

The commandArgs() function returns an array that includes the positional arguments as they appeared on the command line and print() will print all the elements of the array. The script is ended with q() which helps us see

You can R from the command-line by just entering R. This will start an interactive session of the R interpreter where you can simply paste the above two commands. Your output should look something like this:

> args = commandArgs()
> print(args)
[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
> q()

Depending on your configuration and PATH settings, the exact directory location for R may be different. The important part is that you see the output of the print command and that it shows both the array element number in brackets and the text string for that positional parameter of the command line. The output shows that the only command line parameter was the R command itself. Other parameters would have been the 2nd, 3rd, and so forth element of the array.

Note: R numbers it's arrays from 1 where as a traditional C argv would start at 0 (zero).

A Test Script

We can put these commands in a script file with the following command:

cat > printargs.R << EOF
args = commandArgs()
print(args)
q()
EOF

We can then invoke the script by passing it to R with input redirection, eg. the < shell redirect. This replaces the input channel and effectively changes the execution of R to non-interactive mode. R will simply read the commands in the script and exit when done. Since we're not running interactively we have to let R know what to do when it exits because it would ordinarily prompt us at exit asking if we want to save state. We don't need to save state for these tests so we'll use the --no-save option.

R --no-save < printargs.R

When you run this command, you'll now see the output:

R version 2.5.0 (2007-04-23)
...<welcome message omitted>...
> args <- commandArgs()
> print(args)
[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "--no-save"
> q()

Notice that we now have two positional parameters on the command line: the R command like before and the --no-save argument we added. Note: the < printargs.R is not a positional parameter because it is processed by the shell (eg. Bash) before R is run.

By default R output includes a header containing the R version number and some welcome text, this can be suppressed with the --quiet argument. R ordinarily also prints a trace of the commands executed by the interpreter prefixed by > , as if the commands had been entered interactively. This is OK for development but can be a little noisy for production, making it harder to see the output we are trying to generate. We can suppress that behavior with --slave.

Our new command now looks like this:

R --no-save --quite --slave < printargs.R

And running it looks like this:

$ R  --no-save --quiet --slave < printargs.R
[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "--no-save"
[3] "--quiet"
[4] "--slave"

Now we only see the result of the print(args) command instead of all the extra verbose information. We also see that it now displays four command line parameters. Again, the R interpreter is the first one, followed by each of the additional arguments we supplied in the order they appeared on the command line.

The --slave argument actually implies --no-save and --quite so we could get away with just using it:

$ R  --slave < printargs.R
[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "--slave"

This version, drops us down to just two positional parameters.

This is all fairly basic and works as expected.

Using BATCH

We can also run this script using the R CMD BATCH method used by jobs running on the clusters.

R CMD BATCH printargs.R

As mentioned above, the BATCH command causes R to run in non-interactive mode, ie. assume there is no terminal (or user) for input or output. This means the output has to be put somewhere. By default, the BATCH command creates an output file with the same name as the script being used but with the extension '.Rout'. So running the above comand produces a file printargs.Rout which we can display with:

$ cat printargs.Rout

Which shows:

R version 2.5.0 (2007-04-23)
...<welcome message omitted>...
> args <- commandArgs()
> print(args)
[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "-f"
[3] "printargs.R"
[4] "--restore"
[5] "--save"
[6] "--no-readline"
> q()
> proc.time()
   user  system elapsed
  0.861   0.052   0.910

A few things are worth noting:

  1. The output looks just like when R is run interactively with the same intro header and printing of the script commands
  2. The command line parameters show that we are effectively just invoking R but with some extra parameters, which we see in the output of our print command.
  3. There is a performance summary of the time taken to run the script appended to the end

By using the --slave option we could get rid of the header text and command tracing in the usual way:

R CMD BATCH --slave printargs.R

The performance metrics will remain appended and would have to be suppressed with an explicit argument to the quit function, ie. q(runLast=FALSE).

Passing Custom Arguments

The more interesting use for the commandArgs() function is, of course, to pass our own arguments to scripts. Using the argument --args allows us to pass additional arguments that are ignored by the R interpreter. For example:

R --slave --args test1 test2=no < printargs.R

Will output:

[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "--slave"
[3] "--args"
[4] "test1"
[5] "test2=no"

In addition to the arguments that influence R itself, including the --args, we now have our own arguments as positional parameters 4 and 5. If we didn't use --args, R would attempt to interpret our arguments as its own and report errors, eg:

$ R --slave test1 test2=no < printargs.R
ARGUMENT 'test1' __ignored__

ARGUMENT 'test2=no' __ignored__

[1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
[2] "--slave"
[3] "test1"
[4] "test2=no"

Eventhough our arguments are still visible within the script, it's best to avoid the errors by using the --args syntax.

This approach can be used with the BATCH command as well but requires a slight change. The --args test1 test2=no sequence must be quoted in the invoking shell to avoid misparsing by the BATCH command. The correct syntax for use with BATCH is:

R CMD BATCH --slave "--args test1 test2=no" printargs.R 

Looking at the output file printargs.Rout shows:

$ cat printargs.Rout
 [1] "/share/apps/R/R-2.5.0/gnu/lib/R/bin/exec/R"
 [2] "-f"
 [3] "printargs.R"
 [4] "--restore"
 [5] "--save"
 [6] "--no-readline"
 [7] "--slave"
 [8] "--args"
 [9] "test1"
[10] "test2=no"

The '--args test1 test2=no` sequence is split into three distinct arguments as expected. This is counter-intuitive from what ordinarily would happen with quoted arguments in the shell, ie. the string "--args test1 test2=no" would be treated as a single argument.

This behavior seems to stem from the the parsing for the BATCH command itself and might be a bug. The BATCH documentation does not indicate this is required and the correct syntax was discover via  James Forester's blog entry from August 2007.

A note of caution, the BATCH command has a fairly strict syntax: R CMD BATCH options infile [outfile]. If you mess up a parameter you might end up overwriting your script with output. For example, following the documentation for BATCH you might right R CMD BATCH --args test1 printargs.R to pass custom arguments to script. But due to the parsing of BATCH this will cause --args to be treated as a R option, test1 to be treated as your infile and printargs.R to be treated as your outfile. The net result is that your printargs.R script file will get overwritten with the output of trying to run test1 as an R script via the BATCH command. Assuming test1 does not exist, this will cause the error output from this invocation to overwrite printargs.R:

Error: syntax error, unexpected SYMBOL, expecting '\n' or ';' in "R version"
Execution halted

It would be worth determining if this behavior of BATCH respecting --args is intentional or is a bug.

Processing Only Custom Arguments

The commandArgs() function accepts an argument trainingOnly. When TRUE the function only returns the script-specific arguments after the --args argument. This is FALSE by default, so we can slightly modify our script:

cat > printourargs.R << EOF
args = commandArgs(TRUE)
print(args)
q()
EOF

The output of this command then produces:

$ R --slave --args test1 test2=no < printourargs.R
[1] "test1"    "test2=no"

This seems to collapse the custom arguments into a single argument, but this appears to be a characteristic of the print command. The elements can be referenced individually by their array subscript. Note: need to update example to see if a different print syntax will give similar results as before.

R Shell Scripts

An interesting subsection in this appendix discusses  scripting with R and the Rscript command. This wrapper for R allows R to work very similarly to the way traditional scripting languages in Unix behave. This can lead to creating R scripts that can be invoked in the familiar fashion

scriptname arg1 arg2 ...

A Document Viewer Shell Script

Good man-page like documentation exists for most of the R package. It's all accessible within an interactive session using the help() function. It can be useful to have easy access from a command line during development.

The following script will let us have a simple man-like command for the R documentation

cat > manr << EOF
#!/usr/bin/env Rscript

args <- commandArgs(TRUE)
help(args[1])
EOF
chmod +x manr

With this in place we can read R documentation from the command line with a simple command call:

./manr commandArgs

This script can be moved to your $HOME/bin directory and, assuming it's already in your $PATH you can now simply enter:

manr commandArgs

or

manr BATCH