Bill Smith Computational Science and Engineering

Yüklə 539 b.

tarix	06.05.2018
ölçüsü	539 b.
	#42397

MSc in High Performance Computing Computational Chemistry Module Introduction to Molecular Dynamics

Bill Smith
Computational Science and Engineering
STFC Daresbury Laboratory
Warrington WA4 4AD

MSc in High Performance Computing Computational Chemistry Module Lecture 4 – Parallel Performance

Paul Sherwood and Huub J J van Dam
CCLRC Daresbury Laboratory
p.sherwood@daresbury.ac.uk

Outline

Parallel vs. Serial
Identifying bottlenecks
Analyzing bottlenecks

Communication tracing with Vampir

Performance analysis: Parallel vs. Serial

Performance analysis of parallel codes is similar to serial but…
The new dimension in parallel codes is the performance as a function of the number of processors.
In parallel codes the performance is determined by

The fraction of serial work (i.e. lack of parallelism, see Amdahl’s law below)
The efficiency of the communication
The balance between communication and work

The impact of the above is not only a function of the number of processors but also of the problem size
Remember Amdahl’s law…

P is the proportion of the total time spend in a given step
S is the speedup achieved on this particular step
T then is the speedup overall

Identifying bottlenecks

As in the serial case the most gain is to be had by optimising the most expensive steps.
Unlike in the serial case a step that is unimportant on low processor counts may prove to be the bottleneck on high processor counts.
A few tools are available including:

gprof and xprofiler
Build-in timers (in codes developed by people who are serious about performance)

More tools are available if you are prepared to instrument the code but then it makes sense to choose and instrumentation approach that helps analysing communications as well.

Analysing bottlenecks

Finding out why certain sections of the program take so long
Often communication turns out to be a major component
Requires an approach that

Reports on the performance of the communications
Can show the communication behaviour in relation to relevant sections of the program

On HPCx some of this information is accessible through mpiprof
For detailed information however you’ll need to instrument your code.
Tools available include:

Paraver (http://www.cepba.upc.es/)
Vampir (was http://www.pallas.com/ now Intel Trace Analyzer)
OPT (http://www.allinea.com/)
No free tools though

Using vampir: introduction

Vampir

Graphical front end for analysing trace files
Many different displays including

Timelines
Communication statistics
Etc.

Vampirtrace

Trace library for MPI applications
Uses the MPI profiling interface and is therefore independent of a given MPI implementation
Includes instrumentation functions to identify code sections
Has extensions to instrument one-sided communication
Various filters available to reduce trace-file sizes
Uses MPI to gather data from all processors, so you always need some MPI to be able to use it!

Vampirtrace API

Switching tracing on/off

SUBROUTINE VTTRACEOFF( )
SUBROUTINE VTTRACEON( )

Specifying user-defined states

SUBROUTINE VTCLASSDEF(CLASSNAME,CLASSHANDLE,IERR)
SUBROUTINE VTFUNCDEF(FUNCNAME, CLASSHANDLE, STATEHANDLE, IERR)

Entering/leaving user-defined states

SUBROUTINE VTBEGIN(STATEHANDLE, IERR)
SUBROUTINE VTEND(STATEHANDLE, IERR)

Logging message send/receive events (undocumented)

SUBROUTINE VTLOGSENDMSG( IME, ITO, ICNT, ITAG, ICOMMID, IERR)
SUBROUTINE VTLOGRECVMSG( IME, IFRM, ICNT, ITAG, ICOMMID, IERR)

Instrumenting single-sided memory access

Approach 1: Instrument the puts, gets and data server

Advantage: robust and accurate
Disadvantage: one does not always have access to the source of the data server

Approach 2: Instrument the puts and gets only, cheating on the source and destination of the messages

Advantage: no instrumentation of the data server required
Disadvantage: timings of the messages are inaccurate in case of non-blocking operations

Runtime tracing options

The tracing of states can be modified at runtime through a configuration file.
Tracing of messages can not be changed.
VTTRACEON and VTTRACEOFF should be used sparingly.
Gotcha: if you don’t have VTTRACEOFF/VTTRACEON in your code, no states will be traced (but messages will).
The location of the configuration file can be specified by an environment variable VT_CONFIG

Using Vampir

After instrumenting your code you simply run as normal, but you’ll see it produces a number of files of the name .stf*

 Launch vampir to bring up an initial timeline view
 vampir .stf
 
To get the full functionality working load the whole trace file (this may take a little while)
 Right-click on the timeline
 Go to “Load”
 Select “Whole Trace”

 

 

Vampir views
 Identifying bottlenecks
 Summary chart: summarizes the time spend in each class of activity.
 Summary timeline: shows how many processes are busy with a particular class of activity in a sequence of time bins.
 
Analysing bottlenecks
 Global timeline: detailed view of all the activities as well as all the messages being passed.
 Activity chart: shows the time spend in the different activities for each processor.
 Message statistics: can display various statistics about messages being passed between pair of processors.
  

 

Vampir/Vampirtrace installation details
 The Vampir software sits in /usr/local/packages/vampir
 Detailed documentation (PDFs) in vampir/doc
 The vampir analyser lives in vampir/bin
 There are 2 sets of vampirtrace libraries
 For 32 bit codes use vampir/lib
 For 64 bit codes (compiled with –q64) use vampir/lib64
 
Working examples are available in vampir/examples
 Use the mpcc_r and mpxlf_r compilers and link libVT.a as the last library on your link line.

Yüklə 539 b.

Dostları ilə paylaş:

Bill Smith Computational Science and Engineering

MSc in High Performance Computing Computational Chemistry Module Introduction to Molecular Dynamics

Bill Smith

Computational Science and Engineering

STFC Daresbury Laboratory

Warrington WA4 4AD

MSc in High Performance Computing Computational Chemistry Module Lecture 4 – Parallel Performance

Paul Sherwood and Huub J J van Dam

CCLRC Daresbury Laboratory

p.sherwood@daresbury.ac.uk

Outline

Parallel vs. Serial

Identifying bottlenecks

Analyzing bottlenecks

Performance analysis: Parallel vs. Serial

Performance analysis of parallel codes is similar to serial but…

The new dimension in parallel codes is the performance as a function of the number of processors.

In parallel codes the performance is determined by

The impact of the above is not only a function of the number of processors but also of the problem size

Remember Amdahl’s law…

Identifying bottlenecks

As in the serial case the most gain is to be had by optimising the most expensive steps.

Unlike in the serial case a step that is unimportant on low processor counts may prove to be the bottleneck on high processor counts.

A few tools are available including:

More tools are available if you are prepared to instrument the code but then it makes sense to choose and instrumentation approach that helps analysing communications as well.

Analysing bottlenecks

Finding out why certain sections of the program take so long

Often communication turns out to be a major component

Requires an approach that

On HPCx some of this information is accessible through mpiprof

For detailed information however you’ll need to instrument your code.

Tools available include:

Using vampir: introduction

Vampir

Vampirtrace

Vampirtrace API

Switching tracing on/off

Specifying user-defined states

Entering/leaving user-defined states

Logging message send/receive events (undocumented)

Instrumenting single-sided memory access

Approach 1: Instrument the puts, gets and data server

Approach 2: Instrument the puts and gets only, cheating on the source and destination of the messages

Runtime tracing options

The tracing of states can be modified at runtime through a configuration file.

Tracing of messages can not be changed.

VTTRACEON and VTTRACEOFF should be used sparingly.

Gotcha: if you don’t have VTTRACEOFF/VTTRACEON in your code, no states will be traced (but messages will).

The location of the configuration file can be specified by an environment variable VT_CONFIG

Using Vampir

After instrumenting your code you simply run as normal, but you’ll see it produces a number of files of the name .stf*

Launch vampir to bring up an initial timeline view

To get the full functionality working load the whole trace file (this may take a little while)

Vampir views

Identifying bottlenecks

Analysing bottlenecks

Vampir/Vampirtrace installation details

The Vampir software sits in /usr/local/packages/vampir

Detailed documentation (PDFs) in vampir/doc

The vampir analyser lives in vampir/bin

There are 2 sets of vampirtrace libraries

Working examples are available in vampir/examples

Use the mpcc_r and mpxlf_r compilers and link libVT.a as the last library on your link line.

After instrumenting your code you simply run as normal, but you’ll see it produces a number of files of the name `.stf*`

`To get the full functionality working load the whole trace file (this may take a little while)`