Mig Analysis Workflow
The Mig Analysis has a general workflow the is readily scalable. This page covers the generic workflow and how we plan to scale it to the grid.
Workflow in Pseudocode
Note: discussion of non-parallel computation needed for background.
The parallel worflow can be thought of as a double for-loop that processes an array of length iterations in chunks segments.
iterations = 10000
chunks = 10
for { i=0, i < chunks, i++ ) {
for { j=(i * iterations/chunks) + 1, j <= ((i+1) * iterations/chunks), j++ )
compute.R();
}
}
The above example leads to 10 executions of the inner for-loop, with each execution operating over a ranch of the array length as follows
| Outer Loop | Inner Loop Start | Inner Loop End |
| 0 | 1 | 1000 |
| 1 | 1001 | 2000 |
| 2 | 2001 | 3000 |
| 3 | 3001 | 4000 |
| . | . | . |
| . | . | . |
| . | . | . |
| 9 | 9001 | 10000 |
For a situation where iterations = 10000 and chunks = 100, it would break the problem space down as follows:
| Outer Loop | Inner Loop Start | Inner Loop End |
| 0 | 1 | 100 |
| 1 | 101 | 200 |
| 2 | 201 | 300 |
| 3 | 301 | 400 |
| . | . | . |
| . | . | . |
| . | . | . |
| 99 | 9901 | 10000 |
In practice, the inner for-loop is implemented as an R script (ie. a function call) that would need to accept the iterations, chunks, and current chunk i, or alternatively, the start, end, and increment steps (ie. the arguments of the for loop). This leeds to the logic:
iterations = 10000
chunks = 10
for { i=0, i < chunks, i++ ) {
compute.R( start=((i * iterations/chunks) + 1), end=((i+1) * iterations/chunks), increment=1 )
}
}
When working with scheduling systems that support the concept of an "array job" the outer for loop can converted into the array construct of the scheduling system. In this scenario, the schedulers array mechanism just needs to know the number of loops, ie. jobs, the scheduler is required to spawn. This might lead to:
iterations = 10000 chunks = 10 array(10, compute.R( start=((i * iterations/chunks) + 1), end=((i+1) * iterations/chunks), increment=1 )
Where i becomes the array job number of the scheduling system.
