GPIR is a resource monitoring technology used in grid computing environments. It is used by SURAgrid to feed performance statistics to the SURAgrid Portal. Cheaha is listed as a resource on this portal so it needs to send regular data updates.

The following notes document the install of the GPIR resource monitor code on Cheaha. The instructions are specific to cheaha and record all the steps required to get this running and the configuration used. This is highlevel documentation for an eventual resource configuration script.

To install the GPIR providers we are combinding the provider install instructions on the GPIR site, with SURAgrid specific configuration (step 8), and leveraging the locally developed SGE provider modules using an extract from the repository to create the provider source tarball.

Create Monitor Account

The first step was to create a monitor account. We plan to use this for other monitoring, hence the generic name:

useradd -u 408 -d /export/home/gridmon -c "Monitoring for Grids" gridmon

The uid was chosen after scanning /etc/passwd for conflicts. /export/home was chosen to force home dir creation on a local drive. We assigned a password recorded off-line.

We verified the perl SOAP::Lite module, needed by the provider code, was installed with:

rpm -qa | grep perl

Deploy the Provider

We copied the provider tarball created from the [gpir-sge project] export to cheaha (see gpir-sge wiki).

scp providers.tar gridmon@cheaha.ac.uab.edu:

And then on the gridmon account created a subdir for the gpir stuff and unpacked the provider tar there.

# as gridmon on cheaha
mkdir gpir
mv providers.tar gpir
cd gpir
tar -xf providers.tar

Configure Provider

The provider config file needs to be updated to work with the SURAgrid GPIR service environment.

patch providers/perl/conf/providers.conf << EOF
49c49
< hostname=myremotehost.edu
---
> hostname=cheaha.ac.uab.edu
52,54c52,54
< motd.module=../modules/motd.pl
< load.module=<path-to-load-module>
< jobs.module=<path-to-jobs-module>
---
> motd.module=../modules/motd.pl
> load.module=../modules/load.sge.pl
> jobs.module=../modules/jobs.sge.pl
65c65
< gpir.contact=gpirserver.edu:8080
---
> gpir.contact=cuero.tacc.utexas.edu:12080/gpir/webservices
68c68
< admin.email=portaladmin@myorg.edu
---
> admin.email=jpr@uab.edu
EOF

Note: The paths to the resource monitor scripts are relative to the parent resource monitor programs, eg. ~/gpir/providers/perl/src/core. Also the default permissions do not include execute permission. It's not clear if this will pose a problem.

The step 7 of the provider install instructions recommends testing the providers by running them manually. This step assumes the files have been marked executable which is not the default in the distribution, so execute permissions were turned on

chmod +x ~/gpir/providers/perl/src/core/*

Test Provider

The first test of the provider is to run the control program in debug and non-ingest mode, (ingesting is the process of pushing data up to the GPIR data store, as viewed from the data store).

main.pl -f jobs -d -n

This initial test didn't produce output in XML format, as expected. This was due to a bug in debug operation of main.pl which has been fixed.

Configure Crontab

The following crontab needs to be installed for the gridmon account. These instructions are specific to cheaha. Note: this replaces everything in the existing crontab, don't use it if you have other edits. Manually update the crontab instead.

For the SGE provider we need to set two SGE environment variables and run the job and load scripts. Cron invokes the command from the user's HOME directory with the shell, (/usr/bin/sh).

patch gpir/providers/perl/cron/compute.crontab << EOF
#
# Set access to SGE binaries
#
PATH=/opt/gridengine/bin/lx26-amd64:/bin:/usr/bin
SGE_ROOT=/opt/gridengine
SGE_QMASTER_PORT=536

#
# Run GPIR providers
#
*/15 * * * * (cd gpir/providers/perl; ./run.sh load) > ./gpir/providers/perl/logs/load.out 2> ./gpir/providers/perl/logs/load.err
*/15 * * * * (cd gpir/providers/perl; ./run.sh jobs) > ./gpir/providers/perl/logs/jobs.out 2> ./gpir/providers/perl/logs/jobs.err
EOF