CERN openlab II - Platform Competence Centre
- Optimization
IO Testing
When LHC is
running, the experiments will deliver data at rates up to
1.2GByte/s for a long time period. This data has to be
handled efficiently in many different ways. From receiving
the data from the experiment to the final analysis the data
has to be written to tape, exported to the LHC Computing
Grid (LCG)
sites, re-reconstructed with new calibration constants, etc.
etc.
Naturally there is a lot of interest in the most efficient
way of handling the data and, in order to stay efficient, an
outlook into what might be possible in the near future.
The I/O capabilities of a server can be separated into disk
I/O and network I/O.
Disk-IO
The throughput of a particular disk-subsystem depends on a
vast number of parameters.
-
the hard disks ... transfer
rates for a single disk vary by more than a factor 2
-
the controller ...
significant differences between controllers, esp. with a
large number of disks
-
CPU ... how many interrupts
can the CPU(s) handle
-
OS ... how good are the I/O
and Memory management capabilities of the operating
system
Network-IO
Another important distinction has to be made for
network-I/O. It can be a connection in the Local Area
Network (LAN) or in the Wide Area Network (WAN). LAN
connections correspond to connections inside a computer
center, while a WAN connection connects to a server in
another computer center somewhere in the world. The
evolution in networking in the last 10years went from 10Mbit
Ethernet, via 100MBit Ethernet (FastEthernet) and 1Gbit
Ethernet, to the broad availability of 10Gbit Ethernet
today.
openlab-I
The initial IO testing in openlab-I was focused on testing
10Gb Ethernet Network Interface Cards (NICs) in a LAN
environment. Already there it turned out that the main
problem would be the PCI-X bus which connects the NIC to the
system. It wasn't fast enough to achieve full 10Gb out of a
single NIC. This problem is now overcome with the
availability of NICs for the newer PCIExpress bus.
The next step were tests in the wide area network. These
tests were done in close collaboration with
CalTech, the
DataTAG
project and the
TDAQ group of the ATLAS
experiment.
A few achievements:
-
a number of
Internet2 Land
Speed Records
-
the first transeuropean
Ethernet connection (using WAN PHY technology)
-
the first transatlantic
Ethernet connection (using WAN PHY technology)
But all those tests transfered the data only from the memory
of one system into the memory of the other system. In real
life the data would come from disk, so we built an
affordable disk-subsystem capable of achieving transfer
rates comparable to the capabilities of the 10Gb NICs. This
effort led to achieving a transfer rate of 700MB/s for
transfering the data from disk (at CERN) into the memory of
another server an CalTech
more than 16000km away.
While these projects were mostly R&D projects looking into
the (near) future, openlab was also involved in setting up
and running the first "data transfer challenges" in direct
preparation of the data export service to the
LCG Tier 1 sites. The
capabilities of the Itanium2 servers out of the opencluster
were crucial to the success of the "Service Challenges".
openlab-II
All the 10Gb WAN R&D projects had to be halted for a while
because the 10Gb links were used for the Service Challenges.
Luckily more and more additional 10Gb links to the different
LCG Tier 1 sites become
available now, so the projects will be able to commence
again. With the availablity of new hardware and software we
will push the limits again...
Openlab will also work closely with other groups in IT (esp.
FIO) to evaluate the possible use of 10Gb connections to a
new generation of disk servers. Some of the things which
openlab tried and tested a few years ago are now on the way
to be used in the standard operations of the CERN computer
center. |