Table of Contents
Welcome to the R Group
This project is exploring the migration of R-based statistical analysis jobs to a grid based workflow using the UABgrid meta-scheduler instance of GridWay. Details and documentation will be recorded here as we progress through our exploration.
- BasicTests - Run through these to ensure you can use the metascheduler environment
- R Test Scripts - Make sure your cluster-specific accounts are configured to run R properly. Use these scripts to test your configuration.
- RNotes - Notes on running R
- GridWayNotes - Exploration notes on GridWay
- GettingStarted - A brief run through of executing example batch script and r-batch scripts on SGE
- SSG R-Methodological Analysis Scripts - Notes on executing R-scripts given by the SSG on cheaha
- CommandLineProcessing - Exploring command line processing in R
- Installing R - Exploring installation of R on Linux -opensuse 10.3 desktop machine
- WorkflowLogic - An overview of the generic workflow structure and logic.
- ModifiedScripts - Modified SSG's R-Methodological analysis R-script (MigAnalysis.R) and SGE job-submission script (arrayjobsge)
- ResourceSelection - Documenting the availability and selection of compute resources to power the workflow.
- ContainerManagement - Building containers to house the jobs
- Powering Statistical Genetics with the Grid: Using GridWay to Automate R-based Workflows:: Grid Enabling Workshop, Mardi Gras Conference, January 30, 2008 Baton Rouge, LA.
- UAB CI Day 2008 - Poster and presentation for UAB research computing event discussing outcomes of migrating workflow to campus grid, improving the scalability of the workflow.
March 2009 - We were able to take the campus workflow scaling and adapt it to Open Science Grid via the EngageVO. It took about 2 weeks casual work to get user credentials, accounts, and software deployed, one day to wrap the code in OSG submit wrappers, and 1 after noon (4-hours) to harness 1000 CPU-hours of compute time. This represents at 4-hour 250CPU allocation for a production application, that would be difficult to acquire on-campus.
- SSG Homepage - UAB Biostatistics Section on Statistical Genetics
- SSG R Notes - Some notes on using and extending R
- OpenWetware R Overview - a light directory of R resources
- GridWay - information about the grid metascheduler
- R Project - the home site for the R statistical package
- R manuals - CRAN listing of R manuals
- R installation ME Wiki - notes on how R is maintained on Coosa and Cheaha
- SSG considerations on installing R - covers installing R in home dir (summary of official R document).
- Rmpi - how to install and use Rmpi
- Snow (Simple Network of Workstations) - cluster interface for parallel statistical codes
- OpenMPI - New platform for MPI, more likely available on distributed clusters rather than older LAM/MPI.
- LAM/MPI implmentation - foundation of Rmpi/Snow? now in maintenance mode, supperseeded by OpenMPI
- RWebServices - example implementation of web service interface to R related to caGrid project
- http://wiki.fhcrc.org/caBioc - caGrid BioConductor site for RWebServices
- Omega Project for Statistical Computing - site for the R & S Java linking toolkit SJava and other statistical tools.
- Bcfg2 - a configuration management tool being explored for use on UABgrid and promises solution to application maintenance across distributed clusters
R and the Grid
- Simple Parallel Statistical Computing in R (pdf) - paper which describes parallelization efforts of Rmpi and Snow. A critical read for understanding the issues related to adapting to cluster and grid environments.
- RWebServices - Related Work (pdf) - describes similar efforts at parallelizing R workflows using web services. Covers the shared library approach of RWebServices and the message stream approach of OSS and Rserve, which are akin to our current effort using GridWay as the distribution fabric.
- RWebServices - Lessons Learned (pdf) - describes some of the insights gained from the RWebServices share library implementation approach. Section 3 describes considerations for adapting to web services. The course grained workflow considerations are valid for all large distributed configurations.
- RWebServices - Connecting R to Java (pdf) - details consideratoins in linking loosely coupled and tightly coupled data systems. These concerns exist above the GridWay layer we are exploring but are valuable for understanding higher-level interface considerations.
- Bcfg2: Pay as You Go (pdf) - overview of the Bcfg2 configuration management system
- BCFG: A Configuration Management Tool for Heterogenous Environments - introductory paper to BCFG and configuration considerations for heterogenous environments.
- Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control - paper describing a "container" based approach to grid resource management. Interesting analogies to "contanerization" ideas in our approach.
- SCOOP Storm Surge Model - documentation of SCOOP job model with relevant resource selection and application preparation approaches.
- SURAGrid SCOOP ADCIRC Resource Provider Setup Tips - a doc to help compute resource providers see what the people from Renaissance Computing Institute (RENCI) working on SCOOP project are looking for when setting up a new compute resource to their pool of resources.
- R Group - Working group to explore the migration of R workflows to the UABgrid collaboration environment