Table of Contents
Welcome to the R Group
This project is exploring the migration of R-based statistical analysis jobs to a grid based workflow using the UABgrid meta-scheduler instance of GridWay. Details and documentation will be recorded here as we progress through our exploration.
- BasicTests - Run through these to ensure you can use the metascheduler environment
- R Test Scripts - Make sure your cluster-specific accounts are configured to run R properly. Use these scripts to test your configuration.
- RNotes - Notes on running R
- GridWayNotes - Exploration notes on GridWay
- GettingStarted - A brief run through of executing example batch script and r-batch scripts on SGE
- SSG R-Methodological Analysis Scripts - Notes on executing R-scripts given by the SSG on cheaha
- CommandLineProcessing - Exploring command line processing in R
- Installing R - Exploring installation of R on Linux -opensuse 10.3 desktop machine
- WorkflowLogic - An overview of the generic workflow structure and logic.
- ModifiedScripts - Modified SSG's R-Methodological analysis R-script (MigAnalysis.R) and SGE job-submission script (arrayjobsge)
- ResourceSelection - Documenting the availability and selection of compute resources to power the workflow.
- ContainerManagement - Building containers to house the jobs
Discussions
- Discussion-2008-08-06
- Discussion-2008-09-30
- Discussion-2009-10-27 - Boshao Impute-based workflow
Status Reports
Presentations
- Powering Statistical Genetics with the Grid: Using GridWay to Automate R-based Workflows
- Grid Enabling Workshop, Mardi Gras Conference, January 30, 2008 Baton Rouge, LA.
Resources
- SSG Homepage - UAB Biostatistics Section on Statistical Genetics
- SSG R Notes - Some notes on using and extending R
- OpenWetware R Overview - a light directory of R resources
- GridWay - information about the grid metascheduler
- R Project - the home site for the R statistical package
- R manuals - CRAN listing of R manuals
- R installation ME Wiki - notes on how R is maintained on Coosa and Cheaha
- SSG considerations on installing R - covers installing R in home dir (summary of official R document).
- Rmpi - how to install and use Rmpi
- Snow (Simple Network of Workstations) - cluster interface for parallel statistical codes
- OpenMPI - New platform for MPI, more likely available on distributed clusters rather than older LAM/MPI.
- LAM/MPI implmentation - foundation of Rmpi/Snow now in maintenance mode, supperseeded by OpenMPI
- RWebServices - example implementation of web service interface to R related to caGrid project
- http://wiki.fhcrc.org/caBioc - caGrid BioConductor site for RWebServices
- Omega Project for Statistical Computing - site for the R & S Java linking toolkit SJava and other statistical tools.
- Bcfg2 - a configuration management tool being explored for use on UABgrid and promises solution to application maintenance across distributed clusters
References
R and the Grid
- Simple Parallel Statistical Computing in R (pdf) - paper which describes parallelization efforts of Rmpi and Snow. A critical read for understanding the issues related to adapting to cluster and grid environments.
- RWebServices - Related Work (pdf) - describes similar efforts at parallelizing R workflows using web services. Covers the shared library approach of RWebServices and the message stream approach of OSS and Rserve, which are akin to our current effort using GridWay as the distribution fabric.
- RWebServices - Lessons Learned (pdf) - describes some of the insights gained from the RWebServices share library implementation approach. Section 3 describes considerations for adapting to web services. The course grained workflow considerations are valid for all large distributed configurations.
- RWebServices - Connecting R to Java (pdf) - details consideratoins in linking loosely coupled and tightly coupled data systems. These concerns exist above the GridWay layer we are exploring but are valuable for understanding higher-level interface considerations.
Resource Configuration
- Bcfg2: Pay as You Go (pdf) - overview of the Bcfg2 configuration management system
- BCFG: A Configuration Management Tool for Heterogenous Environments - introductory paper to BCFG and configuration considerations for heterogenous environments.
- Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control - paper describing a "container" based approach to grid resource management. Interesting analogies to "contanerization" ideas in our approach.
- SCOOP Storm Surge Model - documentation of SCOOP job model with relevant resource selection and application preparation approaches.
- SURAGrid SCOOP ADCIRC Resource Provider Setup Tips - a doc to help compute resource providers see what the people from Renaissance Computing Institute (RENCI) working on SCOOP project are looking for when setting up a new compute resource to their pool of resources.
Support
Mailing Lists
- R Group - Working group to explore the migration of R workflows to the UABgrid collaboration environment
Attachments
- grid-enabling-r-final_30Jan08.pdf (163.2 kB) -
Presentation at Mardi Gras Conference 30 January 2008
, added by dls@uab.edu on 02/01/08 14:15:49.
