S A R A: H i g h P e r f o r m a n c e C o m p u t i n g i n t h e N e t h e r l a n d s

By Erik Kluit
July, 2002

SARA, the High Performance Computing (HPC) Center of the Dutch universities and research centers, recently acquired six POWER4 Regatta-eServer pSeries 690 supercomputers. The new machines will mainly be used by the institutes of the Netherlands Organization for Scientific Research (NWO), the Vrije Universiteit (VU) and individual researchers in the area of Computational Research (natural science and chemistry simulations). The new Regatta's will deliver around five times the computational power of the present IBM supercomputer at SARA.

Wim Rijks (left) and Jules Wolfrat of SARA Wim Rijks (Systems Manager and Consultant HPC) and Jules Wolfrat (Systems Manager and Programmer HPC), both specialised in parallelization and optimalization of software) go a long way with SARA and AIX: before 1990 SARA hosted a 3090 mainframe with six processors, running AIX/370. Early 90's SARA started using one of the first RS/6000 models: 7013-550 (41 MHz) and three 7013-590's (66 MHz). Nowadays SARA hosts an SP configuration that consists of three "towers" or "frames". Two of the frames, used for batch processing, contain eight so-called "nighthawk-2" high nodes, each of which contains 16 Power3-II processors (375 MHz), with 1 Gbyte of memory per processor. The third frame, used for interactive access and job submission, contains 3 "winterhawk-2" wide nodes, each consisting of 4 Power3-II (375 MHz) processors with 1 Gbyte of memory per processor.
On the picture: Wim Rijks (left) and Jules Wolfrat in front of one of the Regatta's.

SARA Computing and Networking Services is a center of expertise in the area of computers and networks, supplying a complete package of High Performance Computing- and Networking (HPCN) services.

SARA was founded in 1971 by the Vrije Universiteit (Free University) in Amsterdam, the University of Amsterdam and the Mathematical Center.
1985 SARA acquired the status of National High Performance Computing Center and hosted the Dutch national supercomputer. Apart from this SARA hosts a number of other supercomputers, some of which in co-ownership with other universities.
The Dutch universities are important SARA customers, and so are research institutes and industrial research centers, government and businesses. SARA is responsible for the operational management of the academic high speed network of SURFnet.

High Performance Computing (HPC) is technology that is used to solve problems that are very compute intensive and/or require processing of huge amounts of data rapidly. Supercomputers are for example used for fundamental, applied and industrial research in areas like weather forecasting, climate modeling, computational fluid dynamics, chemistry and physics. Supercomputers are also used for rendering (high quality visualisation) of huge amounts of data.

Website: www.sara.nl

Arrival of the Regatta's
The Regatta's (eServer pSeries 690) will be introduced in three phases. At the beginning of January 2002 the first two arrived. Each of Arrival of the first two p690's at SARA these Regatta's contains one node with 32 Power4-processors, with a clock cycle of 1,1 GHz. This adds 280 GFlop/s in one blow to the existing peak performance of 210 GFlop/s, more than doubling the capacity of the system. Each node contains 32 Gbyte of shared memory. Each tower has a local scratch space of 135 Gbyte. The home file systems are mounted from an external file server, just like on the 'old' SP systems.
However, the new towers are not yet connected to the fast switch (SP Switch2), which means that the General Parallel File System (GPFS) is not available yet. GPFS is designed to provide high performance by "striping" I/O across multiple disks and high scalability (by utilizing multiple servers) through the SP Switch2.
Wim Rijks and Jules Wolfrat point out that the performance of the GPFS file system becomes more and more a bottleneck, it can't keep up with the speed of the other resources (processors and memory-access).
For connectivity to the outside world and to the Regatta servers Gigabit-Ethernet is available.
By the end of 2002 the next two Regatta's and the fast switch will be installed, replacing the "nighthawk-2" high nodes. With the installation of the last two machines in 2004 the configuration will consist of 204 processors (six Regatta's and the "winterhawk-2" wide nodes), 204 GB RAM and around 1 TFlop/s of computational power.

The Top 10 Supercomputer Sites of June 2002 (www.top500.org) for comparison:

Rank Manufacturer Computer # Proc Rmax (TFlops) Installation Site Year

1 NEC Earth-Simulator 5120 35.86 Earth Simulator Center Yokohama, Japan 2002

2 IBM ASCI White, SP Power3 375 MHz 8192 7.22 Lawrence Livermore National Laboratory Livermore, USA 2000

3 Hewlett-
Packard AlphaServer SC ES45/1 GHz 3016 4.46 Pittsburgh Supercomputing Center Pittsburgh, USA 2001

4 Hewlett-
Packard AlphaServer SC ES45/1 GHz 2560 3.98 Commissariat a l'Energie Atomique (CEA) Bruyeres-le-Chatel, France 2001

5 IBM SP Power3 375 MHz 16 way 3328 3.05 NERSC/LBNL Berkeley, USA 2001

6 Hewlett-
Packard AlphaServer SC ES45/1 GHz 2048 2.91 Los Alamos National Laboratory Los Alamos, USA 2002

7 Intel ASCI Red 9632 2.37 Sandia National Laboratories Albuquerque, USA 1999

8 IBM pSeries 690 Turbo 1.3GHz 864 2.31 Oak Ridge National Laboratory Oak Ridge, USA 2002

9 IBM ASCI Blue-Pacific SST, IBM SP 604e 5808 2.14 Lawrence Livermore National Laboratory Livermore, USA 1999

10 IBM pSeries 690 Turbo 1.3GHz 768 2.00 IBM/US Army Research Laboratory (ARL) Poughkeepsie, USA 2002

Rank	Manufacturer	Computer	# Proc	Rmax (TFlops)	Installation Site	Year
1	NEC	Earth-Simulator	5120	35.86	Earth Simulator Center Yokohama, Japan	2002
2	IBM	ASCI White, SP Power3 375 MHz	8192	7.22	Lawrence Livermore National Laboratory Livermore, USA	2000
3	Hewlett- Packard	AlphaServer SC ES45/1 GHz	3016	4.46	Pittsburgh Supercomputing Center Pittsburgh, USA	2001
4	Hewlett- Packard	AlphaServer SC ES45/1 GHz	2560	3.98	Commissariat a l'Energie Atomique (CEA) Bruyeres-le-Chatel, France	2001
5	IBM	SP Power3 375 MHz 16 way	3328	3.05	NERSC/LBNL Berkeley, USA	2001
6	Hewlett- Packard	AlphaServer SC ES45/1 GHz	2048	2.91	Los Alamos National Laboratory Los Alamos, USA	2002
7	Intel	ASCI Red	9632	2.37	Sandia National Laboratories Albuquerque, USA	1999
8	IBM	pSeries 690 Turbo 1.3GHz	864	2.31	Oak Ridge National Laboratory Oak Ridge, USA	2002
9	IBM	ASCI Blue-Pacific SST, IBM SP 604e	5808	2.14	Lawrence Livermore National Laboratory Livermore, USA	1999
10	IBM	pSeries 690 Turbo 1.3GHz	768	2.00	IBM/US Army Research Laboratory (ARL) Poughkeepsie, USA	2002

Keep it dry !
I was also invited to take a look on the computer-floor. Computers don't use space these days, true, but a lot of computers do! I was impressed by the size of the Dutch national supercomputer: a 1024-CPU system consisting of two 512-CPU SGI Origin 3800 systems (picture). The machine is fitted with 500MHz R14000 CPUs organized in 256 4-CPU nodes, and has a peak performance of 1 TFlop/s.
Other systems SARA hosts are a CRAY Origin2000 (parallel system), an SGI Onyx2 RealityMonster (grafical system) and three Beowulf clusters (parallel systems).
Several systems at SARA are in the Top500 list of supercomputers.
While we walk down the stairs Wim Rijks explains why the computers are on the first floor: water! In case of a flood (remember we are in the Netherlands here, dikes can break) the computers keep it dry!

Parallelization of Programs
Parallel computer architectures are common in the world of supercomputers. SARA has the necessary skills to adapt or build computer programs to these parallel architectures: analyzing the opportunities for parallelization, choosing the parallelization method, optimizing the serial version of the program and parallelizing the program. On the website of SARA you can find a list of finished Parallel projects. Most of the projects done were funded by NCF (the Dutch National Foundation for Computerfacilities). NCF is responsible for the HPC infrastucture for scientific research in the Netherlands.

The Regatta's are best suited for programs that make use of parallelization (using up-to 32 processors with shared memory) in combination with high processing power needs.
The SP configuration uses OpenMP for running shared-memory parallel programs (C/C++ and Fortran) on one node. OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications.
On the SP it is possible to run parallel programs involving either up to eight 16-CPU nodes or two 32-CPU nodes. However, these programs have to be parallelized with MPI. It is also possible to combine MPI with OpenMP compiler directives. Technically it is possible to run a parallel job involving all of the SP's nodes but the current setup of the batch system does not support this.
MPI is an international accepted standard for a message passing library, for use in C en Fortran programs, and designed for high performance on both massively parallel machines and on workstation clusters.

One of the main users, the Department of Theoretical Chemistry (VU Research Group), will use the Regatta's to run chemistry simulations using the Amsterdam Density Functional software package (ADF) that uses MPI.