CERN openlab
for DataGrid applications
Contribution to
IT department Annual Report 2003
F.Fluckiger, S. Jarp
The CERN openlab for DataGrid applications
is a framework for evaluating and integrating cutting-edge technologies or
services in partnership with industry, focusing on potential solutions for the
LCG. The openlab invites members of the industry to join and contribute
systems, resources or services, and carry out with CERN large-scale
highly-performing evaluation of their solutions in an advanced integrated
environment.
In a nutshell, the major achievements in
2003 were: the successful incorporation of two new partners: IBM and Oracle; the
consolidation and expansion of the opencluster (a powerful compute and
storage farm); the start of the gridification process
of the opencluster; the 10 Gbps challenge
where very high transfer rates were achieved over LAN and WAN distances (the
latter in collaboration with other groups); the organization of three thematic
workshops including one on Total Cost of Ownership; the creation of a new,
lighter, category of sponsors called contributors; the implementation of the
openlab student programme, bringing some
11 students in the summer.
Management
The project is formally led by the IT
Department Head, seconded by the Associate Head, the Chief Technology Officer
and the Communication and Development officer (the latter function will be
provided in 2004 by the departmental Strategy and Communication unit).
Industrial Sponsors
The year 2003 started with three sponsors,
Enterasys Networks
(Contributing high bit-rate network
equipment), Hewlett Packard (computer servers and fellows), Intel Corporation (64-bit
processors technology and 10 Gbps Network Interface Cards). In March 2003, IBM
joint the openlab (to contribute hardware and software disk storage solution),
followed by Oracle Corporation (to contribute Grid technology and fellows).
The annual Board of Sponsors meeting was
successfully held the 13th of June, and the annual report issued at this
occasion. In addition, three Thematic Workshops were organized (on Storage and
Data Management, Fabric Management and Total Cost of Ownership). On the latter
topic (TCO), a position paper establishing the facts and figures was produced.
In order to permit time-limited
incorporations of sponsors to fulfil specific technical missions, a concept of contributor
was devised and proposed to existing sponsored. Contributor status (as opposed
to partner status for existing sponsors) implies lower financial commitment
and correspondingly lesser benefits in terms of influence
and exposure.
Technical
progress
The openlab is constructing the
opencluster, a pilot compute and storage farm based on HP's
dual processors machines, Intel's Itanium Family Processors (IFP) processors, Enterasys's 10-Gbps switches, IBM’s Storage Tank system and
Oracle’s 10g Grid solution.
In 2003, the opencluster was first expanded
with 32 servers (RX2600) equipped with IPF processors (second generation, 1
GHz) and running Red Hat Linux Enterprise Server 2.1 and open AFS and LSF. In
October, 16 servers equipped with IPF’s third
generation processors (1.3 GHz) were added. This is complemented with 7
development systems.
The concept of openlab technical challenge
–where tangible objectives are jointly targeted by some or all of the partners-
was proposed to the sponsors. The first instantiation was the 10 Gbps Challenge,
a common effort by Enterasys, HP, Intel and CERN. In this context, a first experiment where two
Linux-based HP computers with 1GHz IPF processors directly connected (back-to-back
through 10 GbE Network Interface Cards) reached 5.7
Gbps for memory-to-memory transfer (single stream). The transfer took place
over a 10 km fibre. To extend tests over WAN distances, collaborations took
place with the DataTag project and the ATLAS DAQ
group. Using openlab IPF-based servers as end-systems, DataTag
and Caltech established a new world Internet-2 land-speed record. Extensive tests
with Enterasys’s ER16 router demonstrated that 10
Gbps rates could only be achieved through multiple parallel streams. An upgrade
strategy, including the use of Enterasys new N7 devices in 2004 was agreed
between Enterasys and CERN.
On the front of storage, a 30TB disk
sub-system (6 meta-data servers and 8 disk servers) was installed, using IBM’s StorageTank solution. Performance tests will be conducted
in 2004.
Porting to IFP of physics applications (in
collaboration with EP/SFT) and CERN systems continued in 2003, including for Castor,
CLHEP, GEANT4 and ROOT. Other groups
also ported their applications (including ALIROOT by ALICE Collaboration, CMSIM
by CMS US). Results of scalability tests with PROOF were reported to the CHEP2003
conference. As another example of collaboration with other groups, twenty of
the IPF servers were used by ALICE for their 5th Data Challenge.
The gridification
effort culminating with the porting of the LCG middleware (based on VDT and
EDG). After some difficulties, porting was almost completed at the end of the
year. HP Lab’s SmartFrog monitoring system was
evaluated. As first results are promising, the effort will continue in 2004.
A full technical report is annexed.
Dissemination
and Development activities
In addition to the thematic workshops
organized in the framework of the technical programme, two papers were
published in the Proceedings of the CHEP2003 conference and one article was published in the CERN
Courier and 3 joint press releases were issued.
The openlab also hosted at CERN two meetings of the First Tuesday Suisse
Romande series, involving active participation of
openlab partners.
Following a series of meetings, a document
exploring the possibilities for development of the openlab in the field of
security was produced.
Based on a pilot programme run in 2002, a
CERN openlab student programme was run in the summer 2003, involving 11
students from seven European countries. Four of these students contributed
directly to the opencluster activity; the others worked on the Athena
experiment and on the development of the Grid Café web site. The later was
successfully demonstrated at the Telecom2003 exhibition and at the SIS-Forum,
part of the World Summit on Information Technology event.
Resources
The openlab integrates technical and
managerial efforts from several IT groups: ADC (Technical Management;
opencluster via two fellows who joined in 2003; StorageTank);
CS (10 GbE networking); DB (Oracle 10g); DI (Project
management, communication).
Annex: Detailed
technical report
Basic systems
After receiving nine “development” systems
in the last part of 2002, the openlab Itanium Processor Family (IPF) cluster
was expanded with 32 “production” servers at the beginning of the year. These
are HP RX2600 Integrity servers equipped with Intel’s second-generation IPF
processors running at 1 GHz.
The chosen 64-bit software stack consisted
of Red Hat’s Linux Enterprise Server 2.1 (beta version) with openAFS for file access and the Load Sharing Facility (LSF)
for batch control. The installation process was fully aligned with CERN’s
standard procedures for installing and maintaining Linux on the standard 32-bit
PC systems.
Several compilers, most notably the GNU
and the Intel compilers for Itanium, were installed and updated at regular
intervals.
In October 16 additional servers were
added to the cluster. These contained Intel’s third-generation Itanium
processors, now running at 1.3 GHz. Furthermore, two systems (with a
workstation frame) at 1.5 GHz were installed and used to obtain the
best-possible benchmarking results. An agreement was reached with HP and Intel
that should allow all the installed systems to be upgraded to 1.5 GHz early in
2004.
10 Gbit NIC testing
In a relatively simple experiment,
involving Intel’s 10Gbit Ethernet cards in two servers connected back-to-back,
we obtained record-breaking speeds. In a memory-to-memory test, using CERN’s
GENSINK test program, a single stream was measured at 5.7 Gbps (using large
frames). This caught the attention of both the DataTag
project and ATLAS DAQ and both groups “borrowed” two Itanium systems in order
to saturate their transatlantic lines at 10Gbits. Both groups demonstrated
their set-up, at Telecom 2003 in October and RSIS in December. DataTag generated new Internet2 land-speed records, using
IPv4, in the tests carried out between Geneva and CalTech.
In openlab where a third Enterasys ER16
router was installed, 10 Gbps LAN tests were also run, but the results
concluded that these routers could only reach speeds close to 10Gbps when
aggregating traffic. Single-stream traffic between two high-speed servers,
however, remained limited to 1 Gbps due to the trunk-based design of these
routers. Enterasys and CERN have agreed
on an upgrade strategy of the openlab networking equipment that should allow
great improvements to be seen in 2004. In a first phase (towards the end of the
year) CERN installed four N7 high-speed switches that should allow more that
300 connections at 1Gbps and half a dozen at 10Gbps (with improved throughput
capabilities).
StorageTank
IBM joined the CERN openlab in April and,
as a result, we installed a large disk subsystem consisting of six meta-data
servers and eight disk servers with almost 30 TB of storage capability.
Intensive tests of this iSCSI-based disk system were
carried out and we believe that the first Data Challenges can be launched early
in 2004 to evaluate the performance of this novel approach to disk storage.
Application
porting, benchmarks, scalability tests
Early in the year, the porting of CERN’s
Data Management Package, CASTOR, was carried out
successfully. Additioonally, several of CERN’s key software
packages, such as CLHEP, GEANT4 and ROOT, were ported to the Itanium systems in
collaboration with the EP/SFT staff. The entire ALIROOT framework (including a
“private” port of CERNLIB) was also carried out inside the ALICE collaboration.
The CMS simulation program, CMSIM was ported by CMS US
and the entire reconstruction framework, ORCA, is in progress of being ported.
The latter required both the SEAL and POOL packages (supplemented by a large
number of external packages) to be ported as well, so the verification effort
is more complex.
The scalability of parallel ROOT queried
(via PROOF) was tested on the 32-node cluster and found to be linear. This
encouraging result was reported at CHEP2003 and the aim is to try to use 80 –
100 nodes next year for an expanded scalability test.
In a joint activity with Intel, two LHC
applications (ROOT and GEANT4) were used to scrutinize the quality of the code
generation of Intel’s C++ compiler. Dozens of code snippets (such as the
deployment of a rotation matrix) were used in the effort in order to understand
how the compiler could best optimize the CERN applications.
On the 1.5 GHz systems, ROOT version 3.10
was benchmarked at about 1000 ROOTmarks when using
the Intel compiler and aggressive optimization. A GEANT4 example was submitted
for inclusion in SPEC2004 (to increase the number of C++ applications inside
this benchmark suite). The final acceptance will be known by the middle of next
year.
GRID porting
In the summer an ambitious porting effort
of the LCG middleware, based on VDT and EDG, was started. The effort had been
made unnecessarily complex by the way the original software had been generated
(with a large amount of interdependencies and very complex generation
procedures), but at the end of the year all but one RPM (from EDG/WP1) had been
converted and several intermediate tests had been run successfully. The plan is
now to generate IPF-based Worker Nodes (WN) early in 2004 and gradually enhance
the software, so that IPF-based Computing Elements (CE) and Storage Elements
(SE) can also be deployed. (Mention the HP Press Release here?)
SmartFrog
An effort to evaluate HP Labs SmartFrog (Smart Framework for Object Groups) was
evaluated. In spite of having initially received a beta version we were able quickly
to generate a demo that started and monitored a remote Web service. A technical
student will strengthen the SmartFrog effort in 2004.
Alice Data
Challenge V
Twenty of the IPF servers were “lent” to ALICE for their 5th
Data Challenge. The entire DAQ software in general, including the GDC (Global
Data Collector) environment, was transferred to the IPF architecture and ran
flawlessly during the whole exercise.
Fellows/Summer
students
Two fellows, largely funded by HP started
working in openlab in April. In July/August the openlab team hosted four summer
students, working on IPF compilers, 10-Gbps networking, StorageTank
testing, and GRID porting.