This page describes the structure and function of the shell script which executes SSG's R-script
The original and modified shell scripts are attached below.
The job submission command for the modified script should look like this:
qsub -t 1-100 qsub_sge_job.sh 10000 10:0.1:50
1. The '-t' option specifies the number of array jobs to SGE.
So, in the above example the number of array jobs=100
and array starts from 1 with an increment of 1
i.e., SGE_TASK_FIRST = 1 SGE_TASK_LAST = 100 and SGE_TASK_STEPSIZE = 1
2. The first argument after the script file ('qsub_sge_job.sh' in the
above example) specifies the total number of iterations.
So, in the above example, total number of iterations=10000
3. The second argument after the script file specifies the seed value
number_of_imputations:proportion_of_missing_data:no_families, each
separated by a colon.
So, in the above example, seed value=10:0.1:50
Mentioning the number of arrays, total number of iterations, and the seed value as command-line arguments to qsub,
gives the user flexibility to change them according to user discretion/choice.
For eg., for a 10,000 iteration job, the user can easily break up into
100 jobs of 100 iterations each => qsub -t 1-100 qsub_sge_job.sh 10000 10:0.1:50 or
10 jobs of 1000 iterations each => qsub -t 1-10 qsub_sge_job.sh 10000 10:0.1:50 or
1000 jobs of 10 iterations each => qsub -t 1-1000 qsub_sge_job.sh 10000 10:0.1:50
The latest shell script with modifications is as follows:
1 #!/bin/bash
2 #$ -S /bin/bash
3 #$ -V
4 #$ -N mig10
5 #$ -cwd
6 # -m beas
7 # -M ppreddy@@uab.edu
8 #$ -l h_rt=1000:00:00
9 #$ -e ./
10 #$ -o ./
11
12
13 iterations=$1
14 seed=$2
15
16 function parse_seed() {
17
18 echo $seed
19 b=$( echo $seed | awk 'BEGIN{ FS=":" } { print $1 "\n" $2 "\n" $3}' )
20 c=($b)
21 return $c
22
23 }
24
25 function runR() {
26
27 parse_seed
28 c=$(echo $?)
29 n=${c[0]}
30 m=${c[1]}
31 f=${c[2]}
32 amax=`expr $SGE_TASK_LAST - $SGE_TASK_FIRST + 1`
33 task_id=`expr $SGE_TASK_ID - 1`
34 range=`expr $iterations / $amax`
35 index=`expr $task_id \* $range`
36 s=`expr $index + 1`
37 e=`expr $index + $range`
38 echo "task_id=$task_id index=$index"
39 echo "n=$n m=$m f=$f st=$s end=$e"
40 R --silent --no-save --no-restore "--args n=$n m=$m f=$f st=$s end=$e" < MigAnalysis.R
41
42 }
43
44 runR
45
46
Lines 9 and 10
9 #$ -e ./ 10 #$ -o ./
The standard error and standard output streams are the defaults, no specific naming mentioned here.
As a result of this, the standard error and output stream files start with the name mentioned as with '-N ' option
Also, the output of the execution of the R-script will be in the same file as the standard output
Lines 13 and 14
13 iterations=$1 14 seed=$2
The above two shell variables are read from the command-line arguments given to the SGE command 'qsub'
The total number of iterations is the first argument after the script name
The seed value is the second argument after the script name
Lines 16-23
These lines define the function parse_seed. This is same as in the earlier script, except that there is no hard-coding of the seedfile.
Instead, user can mention a particular seed as an argument on the command-line (explained above)
The seed value is parsed, the individual parameters are extracted and saved in an array. This array of parameters is returned by the function
Lines 25-42
These lines define the function runR. Here, the parse_seed function is called and the individual parameters are extracted.
The parameters are:
n=number_of_imputations
m=proportion_of_missing_data
f=no_families
Lines 32-37 compute the start (s) and end (e) values for each iteration of the job
The total number of iterations comes from the first command-line argument to the script
The shell variable amax denotes the maximum number of array jobs this particular script needs to be split into.
This is mentioned with the qsub command option -t (explained above)
Taking the above example, qsub -t 1-100 qsub_sge_job.sh 10000 10:0.1:50
When total number of iterations = 10000 and number of array jobs = 100, the start and end values for each array job are computed as follows:
| Array | Start | End |
| 1 | 1 | 100 |
| 2 | 101 | 200 |
| 3 | 201 | 300 |
| 4 | 301 | 400 |
| . | . | . |
| . | . | . |
| . | . | . |
| 100 | 9901 | 10000 |
Line 40 R --silent --no-save --no-restore "--args n=$n m=$m f=$f st=$s end=$e" < MigAnalysis.R This line is the actual R command. The arguments to the R-script are given by the "--args" option The above particular command causes the output of the execution of the R-script to be written into the same file as the standard output stream file
Line 44
This line calls the function runR, which is executed upon the user entering the qsub command
As a result of the modifications to the shell script, changes were made in the R-script itself. These were mainly related to the parsing of command-line arguments inside the R-script and doing away with hard-coding of R-variables, mpr and noFamilies. The following lines depict the parsing of command-line arguments. The complete MigAnalysis?.R script is attached below.
1 # Move toward a production version that will call the appropriate
2 # functions in a loop.
3
4 library(nlme)
5
6 ## First read in the arguments listed at the command line
7
8 args=(commandArgs(TRUE))
9 print(args)
10
11 ## args is now a list of character vectors
12 ## First check to see if arguments are passed.
13 ## Then cycle through each element of the list and evaluate the expressions.
14
15 if(length(args)==0){
16 print("No arguments supplied.")
17 ## supply default values
18 }else{
19 for(i in 1:length(args)){
20 eval(parse(text=args[[i]]))
21 }
22 }
23
24 nim<-as.integer(n)
25 mpr<-m
26 noFamilies<-as.integer(f)
27 iterEnd <- as.integer(end)
28 iterStart <- as.integer(st)
29 print(nim)
30 print(mpr)
31 print(noFamilies)
32 print(iterEnd)
33 print(iterStart)
34 Sys.sleep(iterEnd/50)
35
36 source("SimBayes.R")
37 source("ExtractData.R")
38 source("EMFull.R")
39 source("MIFull.R")
Attachments
- arrajobsge (366 bytes) -
Original SSG shell script for submitting R-script to SGE
, added by ppreddy@uab.edu on 06/05/08 12:09:05. - MigAnalysis.R (3.9 kB) -
Original SSG R-script
, added by ppreddy@uab.edu on 06/05/08 12:10:50. - modified_arrayjobsge (0.7 kB) -
Modified SSG shell script for submitting R-script to SGE
, added by ppreddy@uab.edu on 06/05/08 12:11:26. - Modified_MigAnalysis.R (6.0 kB) -
Modified SSG R-script
, added by ppreddy@uab.edu on 06/05/08 12:12:07.
