The goal of the profiling is the analysis of the behaviour of the program during execution. This can be accomplished by collecting a wide variety of data, including real data from the CPU, like for instance the cpu cycles, the number of cache misses, the number of branches, in addition to application-related data, like the number of function calls or the call graphs. The data can be used to build a detailed picture of a single application as well as the whole system.

The software instrumentation

In order to collect data during a period of measurement we can use different approaches, one of them being software instrumentation. The idea behind this solution is adding code snippets to collect the required data. Such pieces of code could be added either directly to the source code or to a binary. The process for the first type of instrumentation can be done manually or assisted by the compiler (gcc –pg, gprof). The instrumentation of the binary can be split into two stages, either a binary translation (this is done before the execution of the program) or a dynamic instrumentation (where code snippets are added while running). Both types of binary instrumentation suffer from very huge overheads. For instance, when running a sample benchmark from HEP library on a Xeon processor, the overhead of binary instrumentation with PIN is around 800% and it exceeds 6000% with ATOM. The good point is that the software instrumentation is easily portable, at least across a family of processors. We consider using the software instrumentation for instance in order to obtain information about the number of function calls.

Hardware approach

This approach takes advantage of performance monitors which are available in modern processors. In comparison with instrumentation, we get much more information, not only about problems in the application but possibly also the source of these problems. For instance, on the Itanium processor you can relate CPU stalls to the fundamental cause, like a cache miss, etc. Usually the implementation of this approach consists of the sampling of hardware counters at regular intervals, so-called statistical profiling. This solution has less overhead than the instrumentation, but for sure is not portable between processors. However, since the perfmon2 interface and the corresponding library cover more and more processors; a solution which takes advantage of hardware support becomes much more attractive. In openlab we work on the profiling of small applications up to big frameworks. For small programs the software instrumentation could sound reasonable, but for big application suites with all the associated libraries it could be quite complicated. Keep in mind that we do not always have access to the source code of the profiled applications. In openlab we use in general tools like PerfSuite, oprofile, q-tools, caliper.

Collaboration

We collaborate with developers of the interface to hardware resources by testing perfmon2 on machines with different processors. We also started to contribute to pfmon and we are currently working on making improvements to the resolution of function names when profiling applications that are built with shared libraries. The tool, pfmon, is going to be not only a simple counting tool, but will also become more robust in the area of profiling. This means that it looks promising as a universal tool, available on the various hardware platforms of relevance.

In the area of profiling we have so far been collaborating with two LHC experiments – Atlas and LHCb. We work together in order to better understand how their huge applications behave during long runs. So far, we have been working on simulation jobs as well as on reconstruction jobs. For the simulations, we focused on the Geant4 libraries, because they turn out to be a main consumer of the CPU time. Most of our work has beeen done in the 32-bit environment but currently we are preparing for the move to 64-bit mode. The main tool which we use is PerfSuite. We came across a few challenges concerning this tool, like unpredictable behaviour with the AFS system or the wrong resolution of function names from shared libraries. After some serious effort we got a tool which is portable across processors but which is not easy to use without a prior knowledge of the structure of the profiled application.

Resources

Our presentations from meetings

11th Geant4 Collaboration Workshop and User Conference, 9-14 Oct 2006, Lisbon

Meeting with Atlas, LHCb and Gean4 team, 18 May 2006, CERN

Results


	Feedback: openlab Webmaster Last update: Wednesday, 11. January 2012 16:06	Copyright CERN

The software instrumentation

Hardware approach

Collaboration

Resources

Our presentations from meetings

Results

Atlas simulation

LHCb simulation

Atlas Reconstruction (inDetExample)

Geant4