wiki:GettingStarted
Last modified 5 years ago Last modified on 11/26/12 19:00:04

Notes on Running an r-job on cheaha

  • First edited my .bashrc file on cheaha. (Basically did a copy and paste from the above mentioned page)
  • In order to actually run an R-job, copied an example R-script from the same page http://me.eng.uab.edu/wiki/images/8/85/Hpc_test.R
  • The above script I saved it as hpc_test.R in my current working directory
  • Next, I copied the shell script to submit the R- script, hpc_test.R as sge-hpc-test.sh. The script that I copied was the one which had no external arguments passed to the R-script
    #!/bin/bash
    #$ -S /bin/bash 
    #$ -cwd 
    #$ -m beas
    #$ -N hpcdemo 
    #$ -M YOUR_EMAIL_ADDRESS
    # requesting 6hrs wall clock time 
    #$ -l h_rt=6:00:00
    # request 256 MB of RAM per process
    #$ -l vf=256M
    # Pass relevant environment variables to the job
    #$ -v PATH,R_HOME,R_LIBS,LD_LIBRARY_PATH,CWD
    
    # Load the module for the correct version of R
    # See "module avail R" command output for list of R versions
    # the example uses R 2.12.2
    source /etc/profile.d/modules.sh
    module load R/R-2.12.2
    
    R CMD BATCH --no-save --no-restore hpc_test.R
    
    
  • Failed to edit the -M option, but submitted the job, with qsub sge-hpc-test.sh
  • After the job submission, I opened the output file ( sge-hpc-test.sh.o58346 ) of the job
  • The contents of sge-hpc-test.sh.o58514 were as follows:
     Warning: no access to tty (Bad file descriptor).
     Thus no job control in this shell.
    
  • The FAQ on the page http://me.eng.uab.edu/wiki/index.php?title=R-userinfo had a solution to the above problem, which was:
     
     This warning can safely be ignored, or you can prevent it by using "-S /bin/sh" in you job script, for example, add this line near the top of the job script:
    
     #$ -S /bin/sh
    
    
  • After adding the above line to sge-hpc-test.sh, submitted the script again with qsub sge-hpc-test.sh
  • Three files are generated as a result, which are R-JOB-BATCH-MODE.o58514, R-JOB-BATCH-MODE.e58514, and hpc_test.Rout
  • The contents of R-JOB-BATCH-MODE.o58514 are
      WARNING: ignoring environment value of R_HOME  
    

  • The contents of hpc_test.Rout are
    R : Copyright 2006, The R Foundation for Statistical Computing
    Version 2.3.1 (2006-06-01)
    ISBN 3-900051-07-0
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.
    
      Natural language support but running in an English locale
    
    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.
    
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    
    > invisible(options(echo = TRUE))
    > ##First read in the arguments listed at the command line
    > args=(commandArgs(TRUE))
    Error in commandArgs(TRUE) : unused argument(s) ( ...)
    Execution halted
    
    

Update to the above script

  • Changed the hpc-test.R script by not looking for arguments and hardcoding the variable assignment in the script
  • changed the sge-hpc-tst.sh script to the one which does not look for any external command-line arguments
    #!/bin/bash 
    #$ -cwd 
    #$ -m beas
    #$ -N hpcdemo 
    #$ -M tapan@uab.edu 
    # requesting 6hrs wall clock time 
    #$ -l h_rt=6:00:00
    #$ -v PATH,R_HOME,R_LIBS,LD_LIBRARY_PATH,CWD 
    R CMD BATCH --no-save --no-restore hpc_test.R 
    
  • Cleared the session coolies and HTTP authentication from the browser
    After performing the above changes, I was able to run and submit the r-job successfully

Notes on executing test scripts from r-group on cheaha and stage

  • Following instructions on http://projects.uabgrid.uab.edu/r-group/wiki/BasicTests
  • For setting up an account for UABGrid, clicked on the link https://ca.uabgrid.uab.edu/user/custom_request_cert.php
  • This prompts you to a login through either UAB Authentication/OpenID
  • After loggin in through UAB Authentication, UABGrid Certification Form opens, showing details, like User's name, E-mail Address, Organization
  • For the first time, when I hit submit-request, I got an Error in Form.
    ERROR(S) IN FORM
    Missing e-mail Address. Please register to UABgrid first
    
  • So, clicking on the given link, I registerd to the UABGrid, which goes to the myVocs. In there, I clicked on the preferences tab and added my e-mail address
  • After adding my e-mail address, I clicked on the https://ca.uabgrid.uab.edu/user/custom_request_cert.php again again.
  • On the second time too, my e-mail address was blank.
  • So, closed the certification page and cleared all the session cookies (with the help of Firefox add-on "Web Developer", you can click on cookies tab and clear all your session cookies)
  • Clicked on the link https://ca.uabgrid.uab.edu/user/custom_request_cert.php a third time, and now my e-mail address appears
  • Now I hit on Submit Request and I get the following information
        You are about to create a certificate using the following information:
    
    User's Name       ppreddy@uab.edu
    E-mail Address    ppreddy@uab.edu
    Organization      University of Alabama at Birmingham
    Department/Unit   UABgrid
    Locality          Birmingham
    State/Province    Alabama
    Country           US
    Certificate Life  1 Year
    
  • Created the certificate
  • Then downloaded the userkey.pem and usercert.pem onto my desktop on the local machine
  • cheaha.ac.uab.edu does not have a .globus directory in my /home/ppreddy account. So created a .globus directory and did a scp of userkey.pem and usercert.pem to the .globus directory. When I tried to run gridway commands, like gwhost and gwps , I get the following message on cheaha.ac.uab.edu
        [ppreddy@cheaha ~]$ gwhost
        -bash: gwhost: command not found
    
  • So, logged onto stage.uabgrid.uab.edu and it does have a .globus directory already in my /home/ppreddy folder. So, did a scp of userkey.pem and usercert.pem to the .globus directory. Gridway commands do run on stage.uabgrid.uab.edu
  • Created the example script testjob with the following contents
       EXECUTABLE=/bin/uname
       ARGUMENTS=-a
       RESCHEDULE_ON_FAILURE=no
    
  • When I do a gwsubmit -t testjob, I get the message
      [ppreddy@stage ~]$ gwsubmit -t testjob
      FAILED: failed could not register user (check proxy)
    
    The above finding has been reported as a ticket http://dev.uabgrid.uab.edu/ticket/48
    Also, when the job I submit, is hung, the job submitted by any other user after me also gets hung.
    The problem of failed to register user proxy cascaded because of my failed job.
    So, the gwd had to be killed in order to clear off the hung job. This had to be done with a hard kill kill -9 <pid> and subsequently the lock file removed rm $GW_LOCATION/var/lock
    gwd was started again after clearing off the hung job
  • Without executing the script, I copied the testjob lines onto the command line itself and I get the following message
      [ppreddy@stage ~]$ /bin/uname -a RESCHEDULE_ON_FAILURE=no
      Linux stage.uabgrid.uab.edu 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:30:58 EDT 2007 i686 i686 i386 GNU/Linux
    
  1. Check whether your Java version is 1.4. If it is, upgrade it to Java 1.5

    Java version in our environment is 1.5

  2. Check whether gw was installed in multi-user mode and run the gw_em_mad_ws

    I did execute the gw_em_mad_ws command as given in http://dev.uabgrid.uab.edu/uabgrid-stage/wiki/BuildTheStage

       sudo -u ppreddy /opt/gw/bin/gw_em_mad_ws
    
    for which I got a message, "Not authorized to do so"
    John Paul found out that I was not added to the "gwusers" group and so, I was not able to submit jobs to gridway.
    I was added onto the gwusers group with the command
      usermod -a -G gwusers ppreddy
    
    and now Iam able to submit jobs to the gridway.