wiki:WikiStart

Welcome to the R Group

This project is exploring the migration of R-based statistical analysis jobs to a grid based workflow using the UABgrid meta-scheduler instance of GridWay. Details and documentation will be recorded here as we progress through our exploration.

  • BasicTests - Run through these to ensure you can use the metascheduler environment
  •  R Test Scripts - Make sure your cluster-specific accounts are configured to run R properly. Use these scripts to test your configuration.
  • RNotes - Notes on running R
  • GridWayNotes - Exploration notes on GridWay
  • GettingStarted - A brief run through of executing example batch script and r-batch scripts on SGE
  • SSG R-Methodological Analysis Scripts - Notes on executing R-scripts given by the SSG on cheaha
  • CommandLineProcessing - Exploring command line processing in R
  • Installing R - Exploring installation of R on Linux -opensuse 10.3 desktop machine
  • WorkflowLogic - An overview of the generic workflow structure and logic.
  • ModifiedScripts - Modified SSG's R-Methodological analysis R-script (MigAnalysis.R) and SGE job-submission script (arrayjobsge)
  • ResourceSelection - Documenting the availability and selection of compute resources to power the workflow.
  • ContainerManagement - Building containers to house the jobs

Discussions

Status Reports

Presentations

Outcomes =

March 2009 - We were able to take the campus workflow scaling and adapt it to Open Science Grid via the EngageVO. It took about 2 weeks casual work to get user credentials, accounts, and software deployed, one day to wrap the code in OSG submit wrappers, and 1 after noon (4-hours) to harness 1000 CPU-hours of compute time. This represents at 4-hour 250CPU allocation for a production application, that would be difficult to acquire on-campus.

Resources

References

R and the Grid

  •  RWebServices - Related Work (pdf) - describes similar efforts at parallelizing R workflows using web services. Covers the shared library approach of RWebServices and the message stream approach of OSS and Rserve, which are akin to our current effort using GridWay as the distribution fabric.
  •  RWebServices - Lessons Learned (pdf) - describes some of the insights gained from the RWebServices share library implementation approach. Section 3 describes considerations for adapting to web services. The course grained workflow considerations are valid for all large distributed configurations.
  •  RWebServices - Connecting R to Java (pdf) - details consideratoins in linking loosely coupled and tightly coupled data systems. These concerns exist above the GridWay layer we are exploring but are valuable for understanding higher-level interface considerations.

Resource Configuration

Support

Mailing Lists

Attachments

Download all attachments as: .zip