Bill Smith Computational Science and Engineering

Yüklə 539 b.
ölçüsü539 b.

MSc in High Performance Computing Computational Chemistry Module Introduction to Molecular Dynamics

  • Bill Smith

  • Computational Science and Engineering

  • STFC Daresbury Laboratory

  • Warrington WA4 4AD

MSc in High Performance Computing Computational Chemistry Module Lecture 4 – Parallel Performance

  • Paul Sherwood and Huub J J van Dam

  • CCLRC Daresbury Laboratory



  • Parallel vs. Serial

  • Identifying bottlenecks

  • Analyzing bottlenecks

Performance analysis: Parallel vs. Serial

  • Performance analysis of parallel codes is similar to serial but…

  • The new dimension in parallel codes is the performance as a function of the number of processors.

  • In parallel codes the performance is determined by

    • The fraction of serial work (i.e. lack of parallelism, see Amdahl’s law below)
    • The efficiency of the communication
    • The balance between communication and work
  • The impact of the above is not only a function of the number of processors but also of the problem size

  • Remember Amdahl’s law…

    • P is the proportion of the total time spend in a given step
    • S is the speedup achieved on this particular step
    • T then is the speedup overall

Identifying bottlenecks

  • As in the serial case the most gain is to be had by optimising the most expensive steps.

  • Unlike in the serial case a step that is unimportant on low processor counts may prove to be the bottleneck on high processor counts.

  • A few tools are available including:

    • gprof and xprofiler
    • Build-in timers (in codes developed by people who are serious about performance)
  • More tools are available if you are prepared to instrument the code but then it makes sense to choose and instrumentation approach that helps analysing communications as well.

Analysing bottlenecks

  • Finding out why certain sections of the program take so long

  • Often communication turns out to be a major component

  • Requires an approach that

    • Reports on the performance of the communications
    • Can show the communication behaviour in relation to relevant sections of the program
  • On HPCx some of this information is accessible through mpiprof

  • For detailed information however you’ll need to instrument your code.

  • Tools available include:

    • Paraver (
    • Vampir (was now Intel Trace Analyzer)
    • OPT (
    • No free tools though

Using vampir: introduction

  • Vampir

  • Vampirtrace

    • Trace library for MPI applications
    • Uses the MPI profiling interface and is therefore independent of a given MPI implementation
    • Includes instrumentation functions to identify code sections
    • Has extensions to instrument one-sided communication
    • Various filters available to reduce trace-file sizes
    • Uses MPI to gather data from all processors, so you always need some MPI to be able to use it!

Vampirtrace API

  • Switching tracing on/off

  • Specifying user-defined states

  • Entering/leaving user-defined states

  • Logging message send/receive events (undocumented)


Instrumenting single-sided memory access

  • Approach 1: Instrument the puts, gets and data server

    • Advantage: robust and accurate
    • Disadvantage: one does not always have access to the source of the data server
  • Approach 2: Instrument the puts and gets only, cheating on the source and destination of the messages

    • Advantage: no instrumentation of the data server required
    • Disadvantage: timings of the messages are inaccurate in case of non-blocking operations

Runtime tracing options

  • The tracing of states can be modified at runtime through a configuration file.

  • Tracing of messages can not be changed.

  • VTTRACEON and VTTRACEOFF should be used sparingly.

  • Gotcha: if you don’t have VTTRACEOFF/VTTRACEON in your code, no states will be traced (but messages will).

  • The location of the configuration file can be specified by an environment variable VT_CONFIG

Using Vampir

  • After instrumenting your code you simply run as normal, but you’ll see it produces a number of files of the name .stf*

  • Launch vampir to bring up an initial timeline view

    • vampir .stf
  • To get the full functionality working load the whole trace file (this may take a little while)

    • Right-click on the timeline
    • Go to “Load”
    • Select “Whole Trace”

Vampir views

  • Identifying bottlenecks

    • Summary chart: summarizes the time spend in each class of activity.
    • Summary timeline: shows how many processes are busy with a particular class of activity in a sequence of time bins.
  • Analysing bottlenecks

    • Global timeline: detailed view of all the activities as well as all the messages being passed.
    • Activity chart: shows the time spend in the different activities for each processor.
    • Message statistics: can display various statistics about messages being passed between pair of processors.

Vampir/Vampirtrace installation details

  • The Vampir software sits in /usr/local/packages/vampir

  • Detailed documentation (PDFs) in vampir/doc

  • The vampir analyser lives in vampir/bin

  • There are 2 sets of vampirtrace libraries

    • For 32 bit codes use vampir/lib
    • For 64 bit codes (compiled with –q64) use vampir/lib64
  • Working examples are available in vampir/examples

  • Use the mpcc_r and mpxlf_r compilers and link libVT.a as the last library on your link line.

Yüklə 539 b.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2024
rəhbərliyinə müraciət

    Ana səhifə