In the last year, through its collaboration with
Intel, CERN openlab published three benchmark
reports made publicly available on the openlab
website. It was decided that a standard approach
results in a good documentation of findings and
effortless comparisons in the future. The first
report, published in October 2009, focused on the
evaluation of the energy consumption and the
performance of Intel’s "Nehalem" architecture,
represented by the Intel® Xeon® processor 5500
series. The team evaluated three flavours of parts
with varying power needs and performance levels: the
low power L5520, the mid-range E5540 and the most
powerful of the "Nehalem" series, the X5570. Their
efficiency was evaluated by measuring their typical
power consumption, using standard benchmarks to put
stress on the different subsystems in the server.
The team also assessed the performance of the
processors with the C++ subset of the SPEC2006
benchmarks, dubbed "HEPSPEC06".
The
tests showed impressive results, with the more
recent L5520 delivering a 36% energy efficiency
improvement over the previous generation Xeon 5400 "Harpertown"
servers and the other 5500 flavours reaching 30%.
Improved efficiency was not the only positive point,
since the "Nehalem" introduced ‘’Intel® Turbo Boost
Technology’’ and reintegrated Hyperthreading
Technology, Intel’s SMT (Simultaneous Multi
Threading) implementation which allows each
processor to execute simultaneously two threads per
core by sharing the execution pipelines. SMT was
thoroughly evaluated and appeared to be promising
for the Computer Centre as it enables the throughput
of processed jobs to be increased by 15 to 21%,
based on the tests. This evaluation involved
multiprocessing (using a Monte Carlo based
benchmark, «test40») and a multithreaded benchmark
(«tbb») based on the ALICE High Level Trigger and
the Intel Threading Building Blocks as well as a
complete real-world framework (from ALICE) and
compared the efficiency of different global
scheduling policies.
A subsequent report related to
the Intel® Xeon® processor 5600 series, codenamed "Westmere",
was published in April 2010. The methodology used
closely resembled the one established for the
"Nehalem" report, but some legacy benchmarks were
replaced with modern, real-world multi-threaded
code: a parallel prototype of the Geant4 framework
processing a simulation workload from the CMS
experiment, and a multi-threaded minimisation
application, built on the ROOT framework. It was
determined that the die-shrunk "Westmere"
capitalises on the rich enhancements of the
"Nehalem" microarchitecture through an increased
core count. With respect to the previous generation,
the performance per watt has increased by up to 23%,
the overall system performance was between 39% and
61% better, and the benefit of SMT is practically
unchanged.
Finally, a third report, also
published in April 2010, covered the Intel® Xeon®
processor 7500 series, designed for multi-socket
"Nehalem-EX" platforms. Using the exact same
standard methodology as in the Xeon 5600 evaluation,
CERN openlab learned that the "Nehalem-EX" platform
provides excellent and close to linear scalability,
with many tested applications. Compared to a Xeon
7400 based system, codename "Dunnington", the tested
solution excels in many areas. The HEPSPEC06
benchmark yields 3.5x more throughput, and the
throughput of other workloads has increased by
between 47% and 87%, depending on the application.
The Database Competence Centre has also tested the
7500 series and has found impressive scalability for
the LHC accelerator Oracle database workload. It
enables CERN both to deploy the largest database
applications in an optimal way and to consolidate
cost-efficiently many of the smaller database
applications.
In addition to the regular platform benchmarking
activities, research on Solid State Drives (SSD) has
been restarted after related developments at Intel.
SSD activities take place both in the Platform and
the Database Competence Centres, and a preliminary
SSD evaluation conducted on a Nehalem server has
shown very promising results. A detailed report on
the matter is expected in the upcoming year.
Multi-threading and many-core scalability
Another area where CERN openlab
contributes is compiler optimisation, where the aim
is to improve performance of a wide range of
different jobs by influencing the back-end code
generator. Tests with the Intel C++ compiler,
version 11.1, were performed using both GEANT4 and
ROOT benchmarks for both Intel64 and IA64. The
project is directly related to the multi-core and
many-core revolution, which permits a significant
increase in computing power within a constant
processor power envelope. The move to multi-core
processors has already enabled CERN to benefit from
on-going improvements in overall performance,
without a corresponding increase in processor power
consumption. Although the amount of memory has to be
kept constant per core, the power savings compared
to a non multi-core scenario have been impressive,
which is highly beneficial to CERN. The openlab team
has continued to work on establishing how the new
multi-core architectures relate to High Energy
Physics (HEP) software. As almost all LHC programmes
(simulation, reconstruction, data analysis, etc.)
are written in-house by high energy physicists, it
is crucial to understand which modifications in the
code can provide the most benefit from a multi-core
or many-core architecture. Many experiments related
to this domain were carried out in summer 2009, and
have produced interesting results.
In recent months, parallelisation
efforts, although not widespread at CERN, have
started to bear fruit and are in a large part
actively supported by openlab. One such activity is
carried out by North-eastern University researchers:
PhD student Xin Dong and Prof. Gene Cooperman. It
relates to a complete multi-threaded conversion of a
serial physics processing framework commonly used in
HEP. One of the prototypes resulting from this work
had been passed to openlab for testing. Initial
examinations showed good scalability on various
8-core systems, prevalent in CERN’s computing centre
at that time. Other tests, executed on a 24-core
Dunnington system provided by Intel, confirmed that
such a complex application does not always scale
automatically when moving to double-digit core
counts The openlab Platform Competence Center team
interfaced with the researchers from the USA, as
well as with local experts, in order to find ways to
make the software more scalable. Very positively,
after many months of work by the team’s American
colleagues, the software now scales up to 32 cores
with 93% efficiency – a remarkable and unprecedented
result.
As openlab is looking into the many-core future,
it had the pleasure of welcoming Tim Mattson, an
expert on multithreaded programming from Intel, who
visited the team to share his views and gave an IT
seminar on ‘OpenCL, design patterns and software for
heterogeneous many-core platforms’ at CERN. The move
to a multithreaded paradigm is still in progress at
CERN, although significant advances and
optimisations have already been made, and there are
noticeable improvements introduced by the developing
team based on suggestions from openlab. Furthermore,
in the light of the availability of 32-core
Nehalem-EX systems, openlab is looking forward to
expanding its work on scalability thanks to this new
hardware, and to gaining even more insight into the
scaling behaviour of High Energy Physics frameworks
on modern and future many-core architectures.
High-speed
networking
Last year, several of Intel’s 10
gigabit Network Interface Cards (NIC) were evaluated
within the CERN openlab framework and in
collaboration with other groups in the CERN IT
department for their potential use in the IT
production environment. The NICs surpassed all
expectations and, as a result, there are currently
100 disk servers connected at 10 gigabit in the CERN
Computer Centre. In addition, all tape servers,
which are used for permanently storing LHC data,
will be connected at 10 gigabit from now on. The
InfiniBand based High Performance Computing cluster
used for CFD calculations and other engineering
applications (simulating cooling, temperature
distribution for the experiments, etc.) is being
replaced by a new cluster based on Intel Nehalem
CPUs and Intel NetEffect low-latency 10 gigabit
network cards.
Further
activities
The PCC team participated
actively in numerous schools and workshops, listed
in the dedicated sections of this report, to share
and disseminate the knowledge created. For yet
another year, CERN openlab has been teaming up with
Intel to organise regular training for CERN’s
programmers. For these, Jeff Arnold, Senior Software
Engineer from Intel assisted the regular CERN
openlab lecturers. In addition to the regular
quarterly courses on computer architecture,
performance tuning and multi-threading, three new
special courses have been organised for advanced
CERN users, and were taught by top-level Intel
experts. This activity was warmly welcomed and
created good opportunities for Intel, the physics
community and CERN openlab to share views and
exchange feedback on new technologies and
techniques.
The PCC team’s continued participation in key
conferences, such as Supercomputing and the Intel
Developer Forum, ensures awareness of the latest
technological developments and allows for valuable
meetings with partners. The team also maintains
active involvement in the CERN School of Computing,
holding lectures and exercises yearly at the School
and providing mentoring for lectures at the Inverted
CERN School of Computing held in spring 2010.