wiki:WorkflowLogic
Last modified 9 years ago Last modified on 06/12/08 13:54:29

Mig Analysis Workflow

The Mig Analysis has a general workflow the is readily scalable. This page covers the generic workflow and how we plan to scale it to the grid.

Workflow in Pseudocode

Note: discussion of non-parallel computation needed for background.

The parallel worflow can be thought of as a double for-loop that processes an array of length iterations in chunks segments.

iterations = 10000
chunks = 10 

for { i=0, i < chunks, i++ ) {
   for { j=(i * iterations/chunks) + 1, j <= ((i+1) * iterations/chunks), j++ )
      compute.R();
   }
}

The above example leads to 10 executions of the inner for-loop, with each execution operating over a ranch of the array length as follows

Outer Loop Inner Loop Start Inner Loop End
0 1 1000
1 1001 2000
2 2001 3000
3 3001 4000
. . .
. . .
. . .
9 9001 10000

For a situation where iterations = 10000 and chunks = 100, it would break the problem space down as follows:

Outer Loop Inner Loop Start Inner Loop End
0 1 100
1 101 200
2 201 300
3 301 400
. . .
. . .
. . .
99 9901 10000

In practice, the inner for-loop is implemented as an R script (ie. a function call) that would need to accept the iterations, chunks, and current chunk i, or alternatively, the start, end, and increment steps (ie. the arguments of the for loop). This leeds to the logic:

iterations = 10000
chunks = 10 

for { i=0, i < chunks, i++ ) {
   compute.R( start=((i * iterations/chunks) + 1), end=((i+1) * iterations/chunks), increment=1 )
   }
}

When working with scheduling systems that support the concept of an "array job" the outer for loop can converted into the array construct of the scheduling system. In this scenario, the schedulers array mechanism just needs to know the number of loops, ie. jobs, the scheduler is required to spawn. This might lead to:

iterations = 10000
chunks = 10 

array(10, compute.R( start=((i * iterations/chunks) + 1), end=((i+1) * iterations/chunks), increment=1 )

Where i becomes the array job number of the scheduling system.