[[TOC]] = Welcome to the R Group = This project is exploring the migration of R-based statistical analysis jobs to a grid based workflow using the UABgrid meta-scheduler instance of !GridWay. Details and documentation will be recorded here as we progress through our exploration. * BasicTests - Run through these to ensure you can use the metascheduler environment * [http://me.eng.uab.edu/wiki/index.php?title=R-userinfo R Test Scripts] - Make sure your cluster-specific accounts are configured to run R properly. Use these scripts to test your configuration. * [wiki:RNotes] - Notes on running R * GridWayNotes - Exploration notes on !GridWay * GettingStarted - A brief run through of executing example batch script and r-batch scripts on SGE * [wiki:ssg-scripts SSG R-Methodological Analysis Scripts] - Notes on executing R-scripts given by the SSG on cheaha * CommandLineProcessing - Exploring command line processing in R * [wiki:install-R Installing R] - Exploring installation of R on Linux -opensuse 10.3 desktop machine * WorkflowLogic - An overview of the generic workflow structure and logic. * [wiki:ssg-mig ModifiedScripts] - Modified SSG's R-Methodological analysis R-script (!MigAnalysis.R) and SGE job-submission script (arrayjobsge) * ResourceSelection - Documenting the availability and selection of compute resources to power the workflow. * ContainerManagement - Building containers to house the jobs == Discussions == * [wiki:Discussion-08.06.2008 Discussion-2008-08-06] * [wiki:Discussion-09:30:2008 Discussion-2008-09-30] * [wiki:Discussion-2009-10-27] - Boshao Impute-based workflow == Status Reports == * [wiki:StatusReport-January2008] * [wiki:StatusReport-July2008] == Presentations == [attachment:wiki:WikiStart:grid-enabling-r-final_30Jan08.pdf Powering Statistical Genetics with the Grid: Using GridWay to Automate R-based Workflows]:: Grid Enabling Workshop, [http://www.mardigrasconference.org Mardi Gras Conference], January 30, 2008 Baton Rouge, LA. == Resources == * [http://www.soph.uab.edu/ssg/ SSG Homepage] - UAB Biostatistics Section on Statistical Genetics * [http://www.ssg.uab.edu/wiki/display/swdev/R_Notes SSG R Notes] - Some notes on using and extending R * [http://openwetware.org/wiki/R_Statistics OpenWetware R Overview] - a light directory of R resources * [http://www.gridway.org GridWay] - information about the grid metascheduler * [http://www.r-project.org/ R Project] - the home site for the R statistical package * [http://cran.r-project.org/manuals.html R manuals] - CRAN listing of R manuals * [http://me.eng.uab.edu/wiki/index.php?title=R-admininfo R installation ME Wiki] - notes on how R is maintained on Coosa and Cheaha * [http://www.ssg.uab.edu/wiki/display/swdev/R+Installation SSG considerations on installing R] - covers installing R in home dir (summary of [http://cran.r-project.org/doc/manuals/R-admin.html official R document]). * [http://www.biostat.ucsf.edu/biostat/sen/cluster/RMPI-cluster2.html Rmpi] - how to install and use Rmpi * [http://www.stat.uiowa.edu/~luke/R/cluster/uiowasnow.html Snow (Simple Network of Workstations)] - cluster interface for parallel statistical codes * [http://www.open-mpi.org/ OpenMPI] - New platform for MPI, more likely available on distributed clusters rather than older LAM/MPI. * [http://www.lam-mpi.org/ LAM/MPI implmentation] - foundation of Rmpi/Snow now in maintenance mode, supperseeded by OpenMPI * [http://bioconductor.org/packages/2.1/bioc/html/RWebServices.html RWebServices] - example implementation of web service interface to R related to caGrid project * [http://wiki.fhcrc.org/caBioc] - caGrid !BioConductor site for RWebServices * [http://www.omegahat.org/ Omega Project for Statistical Computing] - site for the R & S Java linking toolkit SJava and other statistical tools. * [http://trac.mcs.anl.gov/projects/bcfg2 Bcfg2] - a configuration management tool being explored for use on UABgrid and promises solution to application maintenance across distributed clusters == References == === R and the Grid === * [http://www.bepress.com/cgi/viewcontent.cgi?article=1016&context=uwbiostat Simple Parallel Statistical Computing in R (pdf)] - paper which describes parallelization efforts of Rmpi and Snow. A critical read for understanding the issues related to adapting to cluster and grid environments. * [http://bioconductor.org/packages/2.1/bioc/vignettes/RWebServices/inst/doc/RelatedWork.pdf RWebServices - Related Work (pdf)] - describes similar efforts at parallelizing R workflows using web services. Covers the shared library approach of RWebServices and the message stream approach of OSS and Rserve, which are akin to our current effort using !GridWay as the distribution fabric. * [http://bioconductor.org/packages/2.1/bioc/vignettes/RWebServices/inst/doc/LessonsLearned.pdf RWebServices - Lessons Learned (pdf)] - describes some of the insights gained from the RWebServices share library implementation approach. Section 3 describes considerations for adapting to web services. The course grained workflow considerations are valid for all large distributed configurations. * [http://bioconductor.org/packages/2.1/bioc/vignettes/RWebServices/inst/doc/RToJava.pdf RWebServices - Connecting R to Java (pdf)] - details consideratoins in linking loosely coupled and tightly coupled data systems. These concerns exist above the !GridWay layer we are exploring but are valuable for understanding higher-level interface considerations. === Resource Configuration === * [ftp://ftp.mcs.anl.gov/pub/bcfg/papers/pay-as-you-go.pdf Bcfg2: Pay as You Go (pdf)] - overview of the Bcfg2 configuration management system * [ftp://ftp.mcs.anl.gov/pub/bcfg/papers/bcfg-cluster2003.pdf BCFG: A Configuration Management Tool for Heterogenous Environments] - introductory paper to BCFG and configuration considerations for heterogenous environments. * [http://ieeexplore.ieee.org/iel5/4090162/4090163/04090194.pdf?tp=&isnumber=4090163&arnumber=4090194 Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control] - paper describing a "container" based approach to grid resource management. Interesting analogies to "contanerization" ideas in our approach. * [http://www.sura.org/cookbook/gtcb/appendices/related-links/StormSurge.php SCOOP Storm Surge Model] - documentation of SCOOP job model with relevant resource selection and application preparation approaches. * [https://www.ccs.uky.edu/scoop/ SURAGrid SCOOP ADCIRC Resource Provider Setup Tips] - a doc to help compute resource providers see what the people from Renaissance Computing Institute (RENCI) working on SCOOP project are looking for when setting up a new compute resource to their pool of resources. == Support == === Mailing Lists === * R Group - Working group to explore the migration of R workflows to the UABgrid collaboration environment * [https://vo.uabgrid.uab.edu/sympa/arc/r-group archive] * [https://vo.uabgrid.uab.edu/sympa/subscribe/r-group subscribe] * [https://vo.uabgrid.uab.edu/sympa/signoff/r-group unsubscribe]