News 

Events

Press Corner

Press Releases

Press Coverage

Multimedia Corner

Partners' Spotlights

Documents

Annual Reports

Newsletter

Technical Documents

Presentations

 

openlab Phase III

Automation Controls CC

Database CC

Networking CC

Platform CC

Previous Phases

Management

Education Corner

 

Student Programme

What is it?

How to apply-2012

Students-2012

Programme-2012

About CERN openlab

What is it?

Participants

Guiding Principles

 
 
 
 

Printable version

The PCC Results (May 2009 - May 2010)

 

The Platform Competence Centre (PCC) was initially created in the context of CERN openlab-II. CERN openlab-III, in collaboration with Intel, continues to address crucial fields such as thermal optimisation, multi-core scalability, application tuning and benchmarking. It also has a strong emphasis on teaching.

Power and computing efficiency

Generally speaking, mastering energy consumption and thermal aspects in large computing centres is one of the major challenges that the information society now faces. This matter is particularly relevant to CERN, as the organisation has more than 6900 servers in its Computer Centre and another 6000 at the detector sites, processing the enormous amounts of data that are being produced by the LHC. Since the CERN Computer Centre facilities are severely limited both in terms of electrical input and of cooling output, the need for intelligent power optimisation is paramount. In order to ensure that the systems render maximum performance per watt, so that the data centre power consumption is maintained within the 2.9 megawatt limit, production and beta platform power measurements are conducted on a regular basis.

Furthermore, improving the performance of the Computing Centre facilities by even a small amount can be equivalent to saving millions of Swiss Francs in hardware purchases. This is why CERN openlab maintains a keen interest in monitoring the performance of both the most recent and upcoming hardware. Apart from standard performance optimisation activities conducted using various Intel and open source software packages, the team published a paper summarising the performance optimisation strategies currently in use at CERN. This work was initially based on interviews conducted with the representatives of the four major experiments at CERN and two commonly used software frameworks.

Benchmarking and optimisation

In the last year, through its collaboration with Intel, CERN openlab published three benchmark reports made publicly available on the openlab website. It was decided that a standard approach results in a good documentation of findings and effortless comparisons in the future. The first report, published in October 2009, focused on the evaluation of the energy consumption and the performance of Intel’s "Nehalem" architecture, represented by the Intel® Xeon® processor 5500 series. The team evaluated three flavours of parts with varying power needs and performance levels: the low power L5520, the mid-range E5540 and the most powerful of the "Nehalem" series, the X5570. Their efficiency was evaluated by measuring their typical power consumption, using standard benchmarks to put stress on the different subsystems in the server. The team also assessed the performance of the processors with the C++ subset of the SPEC2006 benchmarks, dubbed "HEPSPEC06".

The tests showed impressive results, with the more recent L5520 delivering a 36% energy efficiency improvement over the previous generation Xeon 5400 "Harpertown" servers and the other 5500 flavours reaching 30%. Improved efficiency was not the only positive point, since the "Nehalem" introduced ‘’Intel® Turbo Boost Technology’’ and reintegrated Hyperthreading Technology, Intel’s SMT (Simultaneous Multi Threading) implementation which allows each processor to execute simultaneously two threads per core by sharing the execution pipelines. SMT was thoroughly evaluated and appeared to be promising for the Computer Centre as it enables the throughput of processed jobs to be increased by 15 to 21%, based on the tests. This evaluation involved multiprocessing (using a Monte Carlo based benchmark, «test40») and a multithreaded benchmark («tbb») based on the ALICE High Level Trigger and the Intel Threading Building Blocks as well as a complete real-world framework (from ALICE) and compared the efficiency of different global scheduling policies.

A subsequent report related to the Intel® Xeon® processor 5600 series, codenamed "Westmere", was published in April 2010. The methodology used closely resembled the one established for the "Nehalem" report, but some legacy benchmarks were replaced with modern, real-world multi-threaded code: a parallel prototype of the Geant4 framework processing a simulation workload from the CMS experiment, and a multi-threaded minimisation application, built on the ROOT framework. It was determined that the die-shrunk "Westmere" capitalises on the rich enhancements of the "Nehalem" microarchitecture through an increased core count. With respect to the previous generation, the performance per watt has increased by up to 23%, the overall system performance was between 39% and 61% better, and the benefit of SMT is practically unchanged.

Finally, a third report, also published in April 2010, covered the Intel® Xeon® processor 7500 series, designed for multi-socket "Nehalem-EX" platforms. Using the exact same standard methodology as in the Xeon 5600 evaluation, CERN openlab learned that the "Nehalem-EX" platform provides excellent and close to linear scalability, with many tested applications. Compared to a Xeon 7400 based system, codename "Dunnington", the tested solution excels in many areas. The HEPSPEC06 benchmark yields 3.5x more throughput, and the throughput of other workloads has increased by between 47% and 87%, depending on the application. The Database Competence Centre has also tested the 7500 series and has found impressive scalability for the LHC accelerator Oracle database workload. It enables CERN both to deploy the largest database applications in an optimal way and to consolidate cost-efficiently many of the smaller database applications.

In addition to the regular platform benchmarking activities, research on Solid State Drives (SSD) has been restarted after related developments at Intel. SSD activities take place both in the Platform and the Database Competence Centres, and a preliminary SSD evaluation conducted on a Nehalem server has shown very promising results. A detailed report on the matter is expected in the upcoming year.

Multi-threading and many-core scalability

Another area where CERN openlab contributes is compiler optimisation, where the aim is to improve performance of a wide range of different jobs by influencing the back-end code generator. Tests with the Intel C++ compiler, version 11.1, were performed using both GEANT4 and ROOT benchmarks for both Intel64 and IA64. The project is directly related to the multi-core and many-core revolution, which permits a significant increase in computing power within a constant processor power envelope. The move to multi-core processors has already enabled CERN to benefit from on-going improvements in overall performance, without a corresponding increase in processor power consumption. Although the amount of memory has to be kept constant per core, the power savings compared to a non multi-core scenario have been impressive, which is highly beneficial to CERN. The openlab team has continued to work on establishing how the new multi-core architectures relate to High Energy Physics (HEP) software. As almost all LHC programmes (simulation, reconstruction, data analysis, etc.) are written in-house by high energy physicists, it is crucial to understand which modifications in the code can provide the most benefit from a multi-core or many-core architecture. Many experiments related to this domain were carried out in summer 2009, and have produced interesting results.

In recent months, parallelisation efforts, although not widespread at CERN, have started to bear fruit and are in a large part actively supported by openlab. One such activity is carried out by North-eastern University researchers: PhD student Xin Dong and Prof. Gene Cooperman. It relates to a complete multi-threaded conversion of a serial physics processing framework commonly used in HEP. One of the prototypes resulting from this work had been passed to openlab for testing. Initial examinations showed good scalability on various 8-core systems, prevalent in CERN’s computing centre at that time. Other tests, executed on a 24-core Dunnington system provided by Intel, confirmed that such a complex application does not always scale automatically when moving to double-digit core counts The openlab Platform Competence Center team interfaced with the researchers from the USA, as well as with local experts, in order to find ways to make the software more scalable. Very positively, after many months of work by the team’s American colleagues, the software now scales up to 32 cores with 93% efficiency – a remarkable and unprecedented result.

As openlab is looking into the many-core future, it had the pleasure of welcoming Tim Mattson, an expert on multithreaded programming from Intel, who visited the team to share his views and gave an IT seminar on ‘OpenCL, design patterns and software for heterogeneous many-core platforms’ at CERN. The move to a multithreaded paradigm is still in progress at CERN, although significant advances and optimisations have already been made, and there are noticeable improvements introduced by the developing team based on suggestions from openlab. Furthermore, in the light of the availability of 32-core Nehalem-EX systems, openlab is looking forward to expanding its work on scalability thanks to this new hardware, and to gaining even more insight into the scaling behaviour of High Energy Physics frameworks on modern and future many-core architectures.

High-speed networking

Last year, several of Intel’s 10 gigabit Network Interface Cards (NIC) were evaluated within the CERN openlab framework and in collaboration with other groups in the CERN IT department for their potential use in the IT production environment. The NICs surpassed all expectations and, as a result, there are currently 100 disk servers connected at 10 gigabit in the CERN Computer Centre. In addition, all tape servers, which are used for permanently storing LHC data, will be connected at 10 gigabit from now on. The InfiniBand based High Performance Computing cluster used for CFD calculations and other engineering applications (simulating cooling, temperature distribution for the experiments, etc.) is being replaced by a new cluster based on Intel Nehalem CPUs and Intel NetEffect low-latency 10 gigabit network cards.

Further activities

The PCC team participated actively in numerous schools and workshops, listed in the dedicated sections of this report, to share and disseminate the knowledge created. For yet another year, CERN openlab has been teaming up with Intel to organise regular training for CERN’s programmers. For these, Jeff Arnold, Senior Software Engineer from Intel assisted the regular CERN openlab lecturers. In addition to the regular quarterly courses on computer architecture, performance tuning and multi-threading, three new special courses have been organised for advanced CERN users, and were taught by top-level Intel experts. This activity was warmly welcomed and created good opportunities for Intel, the physics community and CERN openlab to share views and exchange feedback on new technologies and techniques.

The PCC team’s continued participation in key conferences, such as Supercomputing and the Intel Developer Forum, ensures awareness of the latest technological developments and allows for valuable meetings with partners. The team also maintains active involvement in the CERN School of Computing, holding lectures and exercises yearly at the School and providing mentoring for lectures at the Inverted CERN School of Computing held in spring 2010.


Last update: Thursday, 26. January 2012 13:13


Copyright CERN