GPIR is a resource monitoring technology used in grid computing environments. It is used by SURAgrid to feed performance statistics to the SURAgrid Portal. Cheaha is listed as a resource on this portal so it needs to send regular data updates.
The following notes document the install of the GPIR resource monitor code on Cheaha. The instructions are specific to cheaha and record all the steps required to get this running and the configuration used. This is highlevel documentation for an eventual resource configuration script.
To install the GPIR providers we are combinding the provider install instructions on the GPIR site, with SURAgrid specific configuration (step 8), and leveraging the locally developed SGE provider modules using an extract from the repository to create the provider source tarball.
Create Monitor Account
The first step was to create a monitor account. We plan to use this for other monitoring, hence the generic name:
useradd -u 408 -d /export/home/gridmon -c "Monitoring for Grids" gridmon
The uid was chosen after scanning /etc/passwd for conflicts. /export/home was chosen to force home dir creation on a local drive. We assigned a password recorded off-line.
We verified the perl SOAP::Lite module, needed by the provider code, was installed with:
rpm -qa | grep perl
Deploy the Provider
We copied the provider tarball created from the [gpir-sge project] export to cheaha (see gpir-sge wiki).
scp providers.tar gridmon@cheaha.ac.uab.edu:
And then on the gridmon account created a subdir for the gpir stuff and unpacked the provider tar there.
# as gridmon on cheaha mkdir gpir mv providers.tar gpir cd gpir tar -xf providers.tar
Configure Provider
The provider config file needs to be updated to work with the SURAgrid GPIR service environment.
patch providers/perl/conf/providers.conf << EOF 49c49 < hostname=myremotehost.edu --- > hostname=cheaha.ac.uab.edu 52,54c52,54 < motd.module=../modules/motd.pl < load.module=<path-to-load-module> < jobs.module=<path-to-jobs-module> --- > motd.module=../modules/motd.pl > load.module=../modules/load.sge.pl > jobs.module=../modules/jobs.sge.pl 65c65 < gpir.contact=gpirserver.edu:8080 --- > gpir.contact=cuero.tacc.utexas.edu:12080/gpir/webservices 68c68 < admin.email=portaladmin@myorg.edu --- > admin.email=jpr@uab.edu EOF
Note: The paths to the resource monitor scripts are relative to the parent resource monitor programs, eg. ~/gpir/providers/perl/src/core. Also the default permissions do not include execute permission. It's not clear if this will pose a problem.
The step 7 of the provider install instructions recommends testing the providers by running them manually. This step assumes the files have been marked executable which is not the default in the distribution, so execute permissions were turned on
chmod +x ~/gpir/providers/perl/src/core/*
Test Provider
The first test of the provider is to run the control program in debug and non-ingest mode, (ingesting is the process of pushing data up to the GPIR data store, as viewed from the data store).
main.pl -f jobs -d -n
This initial test didn't produce output in XML format, as expected. This was due to a bug in debug operation of main.pl which has been fixed.
Configure Crontab
The following crontab needs to be installed for the gridmon account. These instructions are specific to cheaha. Note: this replaces everything in the existing crontab, don't use it if you have other edits. Manually update the crontab instead.
For the SGE provider we need to set two SGE environment variables and run the job and load scripts. Cron invokes the command from the user's HOME directory with the shell, (/usr/bin/sh).
patch gpir/providers/perl/cron/compute.crontab << EOF # # Set access to SGE binaries # PATH=/opt/gridengine/bin/lx26-amd64:/bin:/usr/bin SGE_ROOT=/opt/gridengine SGE_QMASTER_PORT=536 # # Run GPIR providers # */15 * * * * (cd gpir/providers/perl; ./run.sh load) > ./gpir/providers/perl/logs/load.out 2> ./gpir/providers/perl/logs/load.err */15 * * * * (cd gpir/providers/perl; ./run.sh jobs) > ./gpir/providers/perl/logs/jobs.out 2> ./gpir/providers/perl/logs/jobs.err EOF
