Continuous Program Optimization (cpo) Update of cgo’06 Vision



Yüklə 125,5 Kb.
tarix07.11.2018
ölçüsü125,5 Kb.
#78945


Continuous Program Optimization (CPO)

  • Update of CGO’06 Vision


Static compilation system



Static compilation system



Static Compilers

  • Traditional compilation model for C, C++, Fortran, …

  • Extremely mature technology

  • Lots of interaction between compiler development and processor design

  • Static design point allows for extremely deep and accurate analyses supporting sophisticated program transformation for performance.

  • ABI (application binary interface) enables a useful level of language interoperability

  • But…



Static compilation…the downsides

  • Backward compatibility is a big concern

  • Difficult or impossible to evolve language implementation (e.g. C++ object model support for multiple inheritance)

  • CPU designers restricted by requirement to deliver increasing performance to applications that will not be recompiled

    • slows down the uptake of new ISA and micro-architectural features
    • constrains the evolution of CPU design by discouraging radical changes
  • It does (or at lease should) make CPU architects very carefully think about adding anything new because

    • you can almost never get rid of anything you add
    • it takes a long time to find out for sure whether anything you add is good idea or not


Static compilation…the downsides

  • Largely unable to satisfy our increasing desire to exploit dynamic traits of the application

  • Profile-directed feedback can help but still has its limitations

  • Even link-time is too early to be able to catch some high-value opportunities for performance improvement

  • Whole classes of speculative optimizations are infeasible without heroic efforts



Profile-Directed Feedback (PDF)

  • Two-step optimization process:

  • First pass instruments the generated code to collect statistics about the program execution

    • Program compiled with –qpdf1
    • Developer exercises this program with representative inputs to collect representative data
    • Program may be executed multiple times to reflect variety of representative inputs
  • Second pass re-optimizes the program based on the profile data collected

    • Program compiled with -qpdf2


Data collected by PDF

  • Basic block execution counters

    • How many times each basic block in the program is reached
    • Used to derive branch and call frequencies
  • Value profiling

    • Collects a histogram of values for a particular attribute of the program
    • Used for specialization
  • Inlining

    • Uses call frequencies to prioritize inlining sites


Optimizations affected by PDF

  • Function partitioning

    • Groups the program into cliques of routines with high call affinity
  • Speculation

    • Forces evaluation of expressions guarded by branches determined to be infrequently taken
  • Specialization triggered by value profiling

    • Arithmetic ops, built-in function calls, pointer calls


Optimizations triggered by PDF

  • Extended basic block creation

    • Organizes code to frequently fall-through on branches
  • Specialized linkage conventions

    • Treats all registers as non-volatile for infrequent calls
  • Branch hinting

    • Sets branch-prediction hints available on the ISA
  • Dynamic memory reorganization

    • Groups frequently accessed heap storage


Impact of PDF on specInt 2000*



Sounds great…what’s the problem?

  • Only the die-hard performance types use it (e.g. HPC, middleware)

  • It’s tricky to get right…you only want to train the system to recognize things that are characteristic of the application and somehow ignore artifacts of the input set

  • In the end, it’s still static and runtime checks and multiple versions can only take you so far

  • Undermines the usefulness of benchmark results as a predictor of application performance when upgrading hardware

  • In summary…it’s a usability/socialization issue for developers that shows no sign of going away anytime soon



Dynamic Compilation System



Dynamic Compilation

  • Traditional model for languages like Java

  • Rapidly maturing technology

  • Exploitation of current invocation behaviour on exact CPU model

  • Recompilation and other dynamic techniques enable aggressive speculations

  • Profile feedback to optimizer is performed online (transparent to user/application)

  • Compile time budget is concentrated on hottest code with the most (perceived) opportunities

  • But…



Dynamic compilation…the downsides

  • Some important analyses not affordable at runtime even if applied only to the hottest code

  • Non-determinism in the compilation system can be problematic

    • For some users, it severely challenges their notions of quality assurance
    • Requires new approaches to RAS and to getting reproducible defects for the compiler service team
  • Introduces a very complicated code base into each and every application

  • Compile time budget is concentrated on hottest code and not on other code, which in aggregate may be as important a contributor to performance

    • What do you do when there’s no hot code?


Our vision: The best of both worlds



Our vision: The best of both worlds



Our vision: The best of both worlds



More boxes, but is it better?

  • If ubiquitous, could enable a new era in CPU architectural innovation by reducing the load of the dusty deck millstone

    • Deprecated ISA features supported via binary translation or recompilation from “IL-fattened” binary
    • No latency effect in seeing the value of a new ISA feature
    • New feature mistakes become relatively painless to undo


There’s more

  • Transparently bring the benefits of dynamic optimization to traditionally static languages while still leveraging the power of static analysis and language-specific semantic information

    • All of the advantages of dynamic profile-directed feedback (PDF) optimizations with none of the static pdf drawbacks
      • No extra build step
      • No input artifacts skewing specialization choices
      • Code specialized to each invocation on exact processor model
      • More aggressive speculative optimizations
      • Recompilation as a recovery option
    • Static analyses inform value profiling choices
      • New static analysis goal of identifying the inhibitors to optimizations for later dynamic testing and specialization


Break through the layers

  • Abstraction is both the cause of and the solution to many software problems

  • Language and programming model design communities have been adding abstractions to solve their problems and thereby creating new problems for underlying software and hardware implementations

  • Inter-language barriers

    • Inline and optimize across the JNI boundary (VM ’05 IBM paper)
  • Web Services or other loosely coupled systems

    • Eliminate high dispatch costs when local or especially when in-process
  • Application-OS boundaries

    • Optimize and specialize OS user space code into the application calling it
  • Common thread is the need for higher level semantic input to the compilation and runtime systems



There’s always a rub

  • Non-trivial amount of work to bring this technology to full fruition

  • Socialization of dynamic compilation in domains where it has never been accepted is a daunting task

    • Only works when it is based on merit
    • Courage required to start
    • No quick fix here…it just takes time for people to change their views
  • Benchmarking community needs to deal thoughtfully with this kind of system

    • Naïve reaction is that these are benchmark buster technologies
    • Need run rules, benchmarks and input sets that discourage hacking while rewarding techniques and implementations that provide real differentiation for real codes


Today…

  • Compile all methods with dynamic compiler

    • Keep track of all external references
    • Keep track of all internal references
  • Load the result

    • Load everything into writable memory – ultimately, we’ll need O.S. support
    • Keep track of where “everything” is
    • “manually” link all of the .o files
      • Intra-.o file is what we’re looking for
      • Calls to libc need to be handled


…Today

  • Also load

    • The “linker” itself
    • A really simple timer/monitor
      • The degree of sophistication of this unit is unbounded
    • The compiler itself
  • Allow the code to run for some amount of time

  • Use the timer/monitor to decide which routine is “hot”

  • Recompile a “hot” method

    • From the address, find the W-Code
    • Re-compile the W-Code directly into storage
    • Link all references in the generated code (as before)
    • Find all references to the old version and re-direct them


Summary

  • A crossover point has been reached between dynamic and static compilation technologies.

  • They need to be converged/combined to overcome their individual weaknesses

  • Mounting software abstraction complexity forces the scope of compilation to higher levels in order to deliver efficient application performance realizable by non-heroic developers

  • Hardware designers struggle under the mounting burden of maintaining high performance backwards compatibility

  • We’ve started prototyping



Questions



Yüklə 125,5 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə