Cheaha Performance Upgrade 2008
Cheaha will be upgraded through the acquisition of a Dell M1000e Blade Server configured with 24-blades each with 3.0GHz quad-core, Intel Xeon E5450 processors and 16Gb RAM per blade. This will provide 96-processing cores operating at nearly twice the clock speed of the existing AMD Opeteron 242 CPUs. (Benchmarks of will be published after the new equipment is in place.)
Because the upgrade involves a significant technology update, we are planning a phased implementation with three milestones:
- Physical install: install rack in BEC155, connect to HPC network and 10GigE research network, and power-on the new rack
- Software Install: ROCKS 5 and the existing scientific codes
- Migrate Identity: the new head node will take over production work from the existing cheaha and control all compute nodes.
The new hardware will run side-by-side with the existing hardware for a period of time. This will provide a smooth transition onto the new hardware and enable us to leverage existing capacity in addition to the upgraded capacity.
Upgrade Overview
As a part of the compute capacity upgrade, we are planning several improvements to the configuration of cheaha based on the experience we've gained during the UABgrid pilot. Over all, there will be a greater blending of the grid services on cheaha with the existing production services to improve the seamlessness of grid integration. Some of the hightlights:
- Integration of grid metascheduling services - the pilot metacheduling infrastructure based on GridWay? will be installed directly on cheaha enabling users to easily explore the expanded compute capacity available via UABgrid.
- 10GigE connectivity - the UABgrid 10GigE research network to leveraged facilitate staging of jobs across the UABgrid cluster pool and provide direct connectivity to regional research networks in additional compute capacity via NLR.
- Improvements in cluster management software - an updated version of ROCKS and the SGE scheduler will be installed improving the ability to manage allocations for different research and application needs.
- Introduction of compute node classes - the capacity upgrade immediately provides us with two compute node classes: the high-performance Dell M1000 blades based on the dual Quad-core 3GHz Intel chips and the original nodes based on the dual CPU 2.4Ghz AMD chips. This enables us to provide two levels of performance for different classes of compute problems and enables us to explore the delivery of different scheduling models based on service level agreements.
Grid Software Installation
- Globus-4.0.8 - Installing Globus Toolkit version 4.0.8 on cheaha.uabgrid.uab.edu
- Gridway-5.4? Installing Gridway meta-scheduler version 5.4 on cheaha.uabgrid.uab.edu
