LHC Computing (Draft version)
Getting
to grips with the Grid
The year 2002 was a turning point for the
construction of a computing Grid for the LHC. The LHC Computing Grid (LCG)
project was officially launched, with a mission to integrate thousands of
computers at dozens of participating centres worldwide into a global computing
resource. This technological tour-de-force will rely on software being
developed in the European DataGrid (EDG) project, led by CERN, which is the
largest software development project ever funded by the EU. And it will benefit
from hardware developments initiated in the CERN openlab for DataGrid
applications, a novel partnership between CERN and industry.
The Grid may well be the computer buzzword of
the decade. Not since the World Wide Web was developed at CERN, over ten years
ago, has a new networking technology held so much promise for both science and
society. Once again, CERN is set to play a leading role in making the
technology a reality.
The name Grid was coined in analogy with the way
geographically distributed power stations supply power seamlessly to the
electrical grid. The philosophy of the Grid is to provide vast amounts of
computer power at the click of a mouse, by linking geographically distributed
computers and developing software to run
this network of computers as though it were a monolithic resource. Whereas the
Web gives access to distributed information, the Grid does the same for
distributed processing power and storage capacity.
There are many varieties of Grid technology. In
the commercial arena, Grids that harness the combined power of many
workstations within a single organisation are already common. Another popular
Grid-like technology is screensavers, such as SETI@home, which use spare time on
PCs to analyse scientific data. However, CERN’s objective is altogether more
ambitious. The amount of data that will pour out of the LHC experiments will be
of the order of 10 petabytes a year – the equivalent of over 10 million
CD-ROMs. Storing this data in a distributed fashion, and making it easily
accessible to thousands of scientists around the world, is one of the major
challenges for the LCG project.
The LCG project will pool the power of thousands
of computers. In 2002, LCG began rapidly gearing up for this challenge, with
over 50 computer scientists and engineers from partner research centres around
the world joining over the year. The focus in 2002 was on defining the
stringent data storage and processing requirements of the experiments. For
example, a key technical requirement is to ensure data "persistency",
which is how the Grid maintains data availability at all times, even as the underlying
network of computers evolves.
The European dimension
One of the challenges of building a Grid is that
the software needed to keep it ticking over - the middleware - barely exists. In
engineering terms, it is rather like trying to build a suspension bridge before
the technology for steel cables has been fully developed. CERN is not alone in
facing this challenge. Other disciplines, such as bioinformatics and Earth
observation, are also contemplating huge increases in computing and storage
requirements, demanding similar technology. This is why CERN, together with a
host of leading European research centres, took the initiative for the European
DataGrid (EDG) project, to develop a testbed for Grid technologies.
EDG builds on a software toolkit for Grid
technology known as Globus, developed in the
The success of EDG has generated strong support
for a follow-up effort to build a permanent European Grid infrastructure that
can serve a broad spectrum of applications reliably and continuously. Providing
such a service will require a much larger effort than setting up the current
testbed. So CERN has established a pan-European consortium to build a
production Grid infrastructure, in the context of the EU 6th Framework Program.
The potential benefits of such an infrastructure for
The year 2002 also saw the launch of several
other EU-funded Grid projects in which CERN plays a significant role. CERN is
leading DataTag, which ensures hig-speed links and middleware compatibility between
Grids in
Open for industry
In 2002, HP joined Intel and Enterasys Networks
in the CERN openlab for DataGrid applications. This partnership has launched an
ambitious project called CERN opencluster, which combines 64-bit processor
technology from Intel, computer clusters from HP, and a 10 gigabit/s switching
environment from Enterasys Networks. The objective is to build a cluster based
on technologies that are well beyond the cutting edge of what is available on
the market today.
The CERN openlab partnership allows CERN to peer
into the technological crystal ball and test technologies that may well be commercially
competitive when the LHC is up and running. The industrial partners view this
as a great opportunity to develop and test new technologies, which are still
far from the market, under the rigorous and demanding conditions that CERN's
advanced computing environment provides. In particular, the CERN opencluster
will be linked to the EDG testbed, to see how these new technologies perform in
a Grid environment. The results will also be closely monitored by the LCG
project, to determine how the new technologies fit into the project’s future
technology roadmap.
The CERN openlab provides the LHC with a vital
source of industrial sponsorship for long-term technology development. The
equipment for the CERN opencluster, as well as funding for some of the researchers
to develop it, is provided by the industrial partners as part of CERN openlab
membership requirements. The concept has proved very popular, with other major
computer and software manufacturers eager to join.
François Grey