Qdp++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo



Yüklə 454 b.
tarix14.06.2018
ölçüsü454 b.
#48258


QDP++ and Chroma

  • Robert Edwards

  • Jefferson Lab

  • Collaborators:

  • Balint Joo


Lattice QCD – extremely uniform

  • Periodic or very simple boundary conditions

  • SPMD: Identical sublattices per processor



Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware.

  • Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware.



Overlapping communications and computations

  • C(x)=A(x) * shift(B, + mu):

    • Send face forward non-blocking to neighboring node.
    • Receive face into pre-allocated buffer.
    • Meanwhile do A*B on interior sites.
    • “Wait” on receive to perform A*B on the face.
  • Lazy Evaluation (C style):

    • Shift(tmp, B, + mu);
    • Mult(C, A, tmp);




QMP Simple Example

  • char buf[size];

  • QMP_msgmem_t mm;

  • QMP_msghandle_t mh;

  • mm = QMP_declare_msgmem(buf,size);

  • mh = QMP_declare_send_relative(mm,+x);

  • QMP_start(mh);

  • // Do computations

  • QMP_wait(mh);

  • Receiving node coordinates with the same steps except

  • mh = QMP_declare_receive_from(mm,-x);



Data Parallel QDP/C,C++ API

  • Hides architecture and layout

  • Operates on lattice fields across sites

  • Linear algebra tailored for QCD

  • Shifts and permutation maps across sites

  • Reductions

  • Subsets

  • Entry/exit – attach to existing codes





Data-parallel Operations

  • Unary and binary:

  • -a; a-b; …

  • Unary functions:

  • adj(a), cos(a), sin(a), …

  • Random numbers:

  • // platform independent

  • random(a), gaussian(a)



QDP Expressions



Linear Algebra Implementation

  • Naïve ops involve lattice temps – inefficient

  • Eliminate lattice temps -PETE

  • Allows further combining of operations (adj(x)*y)

  • Overlap communications/computations

  • Full performance – expressions at site level



QDP++ Optimization

  • Optimizations “under the hood”

    • Select numerically intensive operations through template specialization.
    • PETE recognises expression templates like:
          • z = a * x + y
      • from type information at compile time.
    • Calls machine specific optimised routine (axpyz)
    • Optimized routine can use assembler, reorganize loops etc.
    • Optimized routines can be selected at configuration time,
    • Unoptimized fallback routines exist for portability


Performance Test Case - Wilson Conjugate Gradient



Chroma

  • A lattice QCD toolkit/library built on top of QDP++

  • Library is a module – can be linked with other codes.

  • Features:

    • Utility libraries (gluonic measure, smearing, etc.)
    • Fermion support (DWF, Overlap, Wilson, Asqtad)
    • Applications:
      • Spectroscopy, Props & 3-pt funcs, eigenvalues
      • Heatbath, HMC
    • Optimization hooks – level 3 Wilson-Dslash for Pentium, QCDOC, BG/L, IBM SP-like nodes (via Bagel)


Software Map

  • Features:

    • Show dir structure


Chroma Lib Structure

  • Chroma Lattice Field Theory library

  • Support for gauge and fermion actions

    • Boson action support
    • Fermion action support
      • Fermion actions
      • Fermion boundary conditions
      • Inverters
      • Fermion linear operators
      • Quark propagator solution routines
    • Gauge action support
      • Gauge actions
      • Gauge boundary conditions
  • IO routines

    • Enums
  • Measurement routines

    • Eigenvalue measurements
    • Gauge fixing routines
    • Gluonic observables
    • Hadronic observables
    • Inline measurements
      • Eigenvalue measurements
      • Glue measurements
      • Hadron measurements
      • Smear measurements
    • Psibar-psi measurements
    • Schroedinger functional
    • Smearing routines
    • Trace-log support


Fermion Actions

  • Actions are factory objects (foundries)

    • Do not hold gauge fields – only params
    • Factory/creation functions with gauge field argument
      • Takes a gauge field - creates a State & applies fermion BC.
      • Takes a State – creates a Linear Operator (dslash)
      • Takes a State – creates quark prop. solvers
    • Linear Ops are function objects
      • E.g., class Foo {int operator() (int x);} fred; // int z=fred(1);
      • Argument to CG, MR, etc. – simple functions
  • Created with XML



Fermion Actions - XML

  • Tag FermAct is key in lookup map of constructors

  • During construction, action reads XML

  • FermBC tag invokes another lookup



HMC and Monomials

  • HMC built on Monomials

  • Monomials define Nf, gauge, etc.

  • Only provide Mom à deriv(U) and S(U) . Pseudoferms not visible.

  • Have Nf=2 and rational Nf=1

  • Both 4D and 5D versions.



Gauge Monomials

  • Gauge monomials:

    • Plaquette
    • Rectangle
    • Parallelogram
  • Monomial constructor will invoke constructor for Name in GaugeAction



Chroma – Inline Measurements

  • HMC has Inline meas.

  • Chroma.cc is Inline only code.

  • Former mainprogs now inline meas.

  • Meas. are registered with constructor call.

  • Meas. given gauge field – no return value.

  • Only communicate to each other via disk (maybe mem. buf.??)



Binary File/Interchange Formats

  • Metadata – data describing data; e.g., physics params

  • Use XML for metadata

  • File formats:

    • Files mixed mode – XML ascii+binary
    • Using DIME (similar to e-mail MIME) to package
    • Use BinX (Edinburgh) to describe binary
  • Replica-catalog web-archive repositories



For More Information

  • U.S. Lattice QCD Home Page:

  • http://www.usqcd.org/

  • The JLab Lattice Portal http://lqcd.jlab.org/

  • High Performance Computing at JLab

  • http://www.jlab.org/hpc/



Yüklə 454 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə