Cs 52 Computer Architecture and Engineering Lecture Introduction Krste Asanovic



Yüklə 1,61 Mb.
tarix08.08.2018
ölçüsü1,61 Mb.
#61914


CS 152 Computer Architecture and Engineering Lecture 1 - Introduction

  • Krste Asanovic

  • Electrical Engineering and Computer Sciences

  • University of California at Berkeley

  • http://www.eecs.berkeley.edu/~krste

  • http://inst.eecs.berkeley.edu/~cs152


Computing Devices Then…

  • EDSAC, University of Cambridge, UK, 1949



Computing Devices Now



What is Computer Architecture?



Abstraction Layers in Modern Systems



Uniprocessor Performance



The End of the Uniprocessor Era

  • Single biggest change in the history of computing systems



Conventional Wisdom in Computer Architecture

  • Old Conventional Wisdom: Power is free, Transistors expensive

  • New Conventional Wisdom: “Power wall” Power expensive, Transistors free (Can put more on chip than can afford to turn on)

  • Old CW: Sufficient increasing Instruction-Level Parallelism via compilers, innovation (pipelining, superscalar, out-of-order, speculation, VLIW, …)

  • New CW: “ILP wall” law of diminishing returns on more HW for ILP

  • Old CW: Multiplies are slow, Memory access is fast

  • New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply)

  • Old CW: Uniprocessor performance 2X / 1.5 yrs

  • New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall

    • Uniprocessor performance now 2X / 5(?) yrs
  •  Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years)

      • More, simpler processors are more power efficient


Sea Change in Chip Design

  • Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip



Déjà vu all over again?

  • Multiprocessors imminent in 1970s, ‘80s, ‘90s, …

  • “… today’s processors … are nearing an impasse as technologies approach the speed of light..”

  • David Mitchell, The Transputer: The Time Is Now (1989)

  • Transputer was premature  Custom multiprocessors tried to beat uniprocessors  Procrastination rewarded: 2X seq. perf. / 1.5 years

  • “We are dedicating all of our future product development to multicore designs. … This is a sea change in computing”

  • Paul Otellini, President, Intel (2004)

  • Difference is all microprocessor companies have switched to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2+ CPUs)  Procrastination penalized: 2X sequential perf. / 5 yrs  Biggest programming challenge: from 1 to 2 CPUs



Problems with Sea Change

  • Algorithms, Programming Languages, Compilers, Operating Systems, Architectures, Libraries, … not ready to supply Thread-Level Parallelism or Data-Level Parallelism for 1000 CPUs / chip,

  • Architectures not ready for 1000 CPUs / chip

    • Unlike Instruction-Level Parallelism, cannot be solved by computer architects and compiler writers alone, but also cannot be solved without participation of architects
  • Need a reworking of all the abstraction layers in the computing system stack



Abstraction Layers in Modern Systems



The New CS152

  • New CS152 focuses on interaction of software and hardware

    • more architecture and less digital engineering.
  • No FPGA design component

    • There is now a separate FPGA design lab class (CS194 in Fall 2008), where you can try building some of the architectural ideas we’ll explore this semester (100% digital engineering)
  • Much of the material you’ll learn this term was previously in CS252

    • Some of the current CS61C, I first saw in CS252 nearly 20 years ago!
    • Maybe every 10 years, shift CS252->CS152->CS61C?
  • Class contains labs based on various different machine designs

    • Experiment with how architectural mechanisms work in practice on real software.


CS 152 Course Focus

  • Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century



The New CS152 Executive Summary



CS152 Administrivia

  • Instructor: Prof. Krste Asanovic

  • Office: 579 Soda Hall, krste@eecs

  • Office Hours: M 1:30-2:30PM (email to confirm), 579 Soda

  • T. A.: Scott Beamer, sbeamer@eecs

  • Office Hours: TBD

  • Lectures: Tu/Th, 5:00-6:30PM, 320 Soda (may change)

  • Section: F 12-1pm, 258 Dwinelle (room may change)

  • Text: Computer Architecture: A Quantitative Approach,

  • 4th Edition (Oct, 2006)

  • Readings assigned from this edition, don’t use earlier Eds.

  • Web page: http://inst.eecs.berkeley.edu/~cs152

  • Lectures available online before noon, day of lecture



CS152 Structure and Syllabus

  • Six modules

    • Simple machine design (ISAs, microprogramming, unpipelined machines, Iron Law, simple pipelines)
    • Memory hierarchy (DRAM, caches, optimizations)
    • Virtual memory systems, exceptions, interrupts
    • Complex pipelining (score-boarding, out-of-order issue)
    • Explicitly parallel processors (vector machines, VLIW machines, multithreaded machines)
    • Multiprocessor architectures (cache coherence, memory models, synchronization)


CS152 Course Components

  • 20% Problem Sets (one per module)

    • Intended to help you learn the material. Feel free to discuss with other students and instructors, but must turn in your own solutions. Grading based mostly on effort, but quizzes assume that you have worked through all problems. Solutions released after PSs handed in.
  • 40% Quizzes (one per module)

    • In-class, closed-book, no calculators or computers.
    • Based on lectures, problem sets, and labs
  • 40% Labs (one per module)

    • Labs use advanced full system simulators (Virtutech Simics)
    • Directed plus open-ended sections to each lab


CS152 Labs

  • Each lab has directed plus open-ended assignments

    • Roughly 50/50 split of grade
  • Directed portion is intended to ensure students learn main concepts behind lab

    • Each student must perform own lab and hand in their own lab report
  • Open-ended assigment is to allow you to show your creativity

    • Roughly a one day “mini-project”
      • E.g., try an architectural idea and measure potential, negative results OK (if explainable!)
    • Students can work individually or in groups of two or three
    • Group open-ended lab reports must be handed in separately
    • Students can work in different groups for different assignments


Related Courses



Computer Architecture: A Little History

  • Throughout the course we’ll use a historical narrative to help understand why certain ideas arose

  • Why worry about old ideas?

  • Helps to illustrate the design process, and explains why certain decisions were taken

  • Because future technologies might be as constrained as older ones

  • Those who ignore history are doomed to repeat it

    • Every mistake made in mainframe design was also made in minicomputers, then microcomputers, where next?


Charles Babbage 1791-1871 Lucasian Professor of Mathematics, Cambridge University, 1827-1839



Charles Babbage

  • Difference Engine 1823

  • Analytic Engine 1833

    • The forerunner of modern digital computer!


Difference Engine A machine to compute mathematical tables

  • Weierstrass:

    • Any continuous function can be approximated by a polynomial
    • Any polynomial can be computed from difference tables
  • An example

      • f(n) = n2 + n + 41
      • d1(n) = f(n) - f(n-1) = 2n
      • d2(n) = d1(n) - d1(n-1) = 2
      • f(n) = f(n-1) + d1(n) = f(n-1) + (d1(n-1) + 2)


Difference Engine

  • 1823

    • Babbage’s paper is published
  • 1834

    • The paper is read by Scheutz & his son in Sweden
  • 1842

    • Babbage gives up the idea of building it; he is onto Analytic Engine!
  • 1855

    • Scheutz displays his machine at the Paris World Fare
    • Can compute any 6th degree polynomial
    • Speed: 33 to 44 32-digit numbers per minute!


Analytic Engine

  • 1833: Babbage’s paper was published

    • conceived during a hiatus in the development of the difference engine
  • Inspiration: Jacquard Looms

    • looms were controlled by punched cards
      • The set of cards with fixed punched holes dictated the pattern of weave program
      • The same set of cards could be used with different colored threads numbers
  • 1871: Babbage dies

    • The machine remains unrealized.


Analytic Engine The first conception of a general-purpose computer

  • The store in which all variables to be operated upon, as well as all those quantities which have arisen from the results of the operations are placed.

  • The mill into which the quantities about to be operated upon are always brought.



The first programmer Ada Byron aka “Lady Lovelace” 1815-52



Babbage’s Influence

  • Babbage’s ideas had great influence later primarily because of

    • Luigi Menabrea, who published notes of Babbage’s lectures in Italy
    • Lady Lovelace, who translated Menabrea’s notes in English and thoroughly expanded them.
      • “... Analytic Engine weaves algebraic patterns....”
  • In the early twentieth century - the focus shifted to analog computers but

    • Harvard Mark I built in 1944 is very close in spirit to the Analytic Engine.


Harvard Mark I

  • Built in 1944 in IBM Endicott laboratories

    • Howard Aiken – Professor of Physics at Harvard
    • Essentially mechanical but had some electro-magnetically controlled relays and gears
    • Weighed 5 tons and had 750,000 components
    • A synchronizing clock that beat every 0.015 seconds (66Hz)


Linear Equation Solver John Atanasoff, Iowa State University

  • 1930’s:

    • Atanasoff built the Linear Equation Solver.
    • It had 300 tubes!
    • Special-purpose binary digital calculator
    • Dynamic RAM (stored values on refreshed capacitors)
  • Application:

    • Linear and Integral differential equations
  • Background:

    • Vannevar Bush’s Differential Analyzer
    • --- an analog computer
  • Technology:

    • Tubes and Electromechanical relays


Electronic Numerical Integrator and Computer (ENIAC)

  • Inspired by Atanasoff and Berry, Eckert and Mauchly designed and built ENIAC (1943-45) at the University of Pennsylvania

  • The first, completely electronic, operational, general-purpose analytical calculator!

    • 30 tons, 72 square meters, 200KW
  • Performance

    • Read in 120 cards per minute
    • Addition took 200 s, Division 6 ms
    • 1000 times faster than Mark I
  • Not very reliable!



Electronic Discrete Variable Automatic Computer (EDVAC)

  • ENIAC’s programming system was external

    • Sequences of instructions were executed independently of the results of the calculation
    • Human intervention required to take instructions “out of order”
  • Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this problem

    • Solution was the stored program computer
    •  “program can be manipulated as data”
  • First Draft of a report on EDVAC was published in 1945, but just had von Neumann’s signature!

    • In 1973 the court of Minneapolis attributed the honor of inventing the computer to John Atanasoff


Stored Program Computer

  • manual control calculators

  • automatic control

    • external (paper tape) Harvard Mark I , 1944
  • Zuse’s Z1, WW2

    • internal
      • plug board ENIAC 1946
      • read-only memory ENIAC 1948
      • read-write memory EDVAC 1947 (concept )
        • The same storage can be used to store program and data


Technology Issues



Dominant Problem: Reliability



Commercial Activity: 1948-52

  • IBM’s SSEC (follow on from Harvard Mark I)

    • Selective Sequence Electronic Calculator
    • 150 word store.
    • Instructions, constraints, and tables of data were read from paper tapes.
    • 66 Tape reading stations!
    • Tapes could be glued together to form a loop!
    • Data could be output in one phase of computation and read in the next phase of computation.


And then there was IBM 701



Computers in mid 50’s

  • Hardware was expensive

  • Stores were small (1000 words)

    •  No resident system software!
  • Memory access time was 10 to 50 times slower than the processor cycle

    •  Instruction execution time was totally dominated by the memory reference time.
  • The ability to design complex control circuits to execute an instruction was the central design concern as opposed to the speed of decoding or an ALU operation

  • Programmer’s view of the machine was inseparable from the actual hardware implementation



The IBM 650 (1953-4)



Programmer’s view of the IBM 650



The Earliest Instruction Sets



Programming: Single Accumulator Machine



Self-Modifying Code



Index Registers Tom Kilburn, Manchester University, mid 50’s



Using Index Registers



Operations on Index Registers



Evolution of Addressing Modes



Variety of Instruction Formats

  • Two address formats: the destination is same as one of the operand sources

    • (Reg  Reg) to Reg RI  (RI) + (RJ)
    • (Reg  Mem) to Reg RI  (RI) + M[x]
    • x can be specified directly or via a register
    • effective address calculation for x could include indexing, indirection, ...
  • Three address formats: One destination and up to two operand sources per instruction

    • (Reg x Reg) to Reg RI  (RJ) + (RK)
    • (Reg x Mem) to Reg RI  (RJ) + M[x]


More Instruction Formats

  • Zero address formats: operands on a stack

    • add M[sp-1]  M[sp] + M[sp-1]
    • load M[sp]  M[M[sp]]
    • Stack can be in registers or in memory (usually top of stack cached in registers)
  • One address formats: Accumulator machines

    • Accumulator is always other implicit operand


Data Formats and Memory Addresses



Software Developments



Compatibility Problem at IBM



IBM 360 : Design Premises Amdahl, Blaauw and Brooks, 1964

  • The design must lend itself to growth and successor machines

  • General method for connecting I/O devices

  • Total performance - answers per month rather than bits per microsecond programming aids

  • Machine must be capable of supervising itself without manual intervention

  • Built-in hardware fault checking and locating aids to reduce down time

  • Simple to assemble systems with redundant I/O devices, memories etc. for fault tolerance

  • Some problems required floating point words larger than 36 bits



IBM 360: A General-Purpose Register (GPR) Machine

  • Processor State

    • 16 General-Purpose 32-bit Registers
      • may be used as index and base register
      • Register 0 has some special properties
    • 4 Floating Point 64-bit Registers
    • A Program Status Word (PSW)
      • PC, Condition codes, Control flags
  • A 32-bit machine with 24-bit addresses

    • But no instruction contains a 24-bit address!
  • Data Formats

    • 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words


IBM 360: Initial Implementations



IBM 360: 45 years later… The zSeries z10 Microprocessor

  • 4.4 GHz in IBM 65nm SOI CMOS technology

  • 994 million transistors in 454mm2

  • 64-bit virtual addressing

    • original S/360 was 24-bit, and S/370 was 31-bit extension
  • Quad core design

  • Dual-issue in-order superscalar CISC pipeline

  • Out-of-order memory accesses

  • Redundant datapaths

    • every instruction performed in two parallel datapaths and results compared
  • 64KB L1 I-cache, 128KB L1 D-cache on-chip

  • 3MB private L2 unified cache per core, on-chip

  • Off-chip L3 cache of up to 48MB

  • 10K-entry Branch Target Buffer

    • Very large buffer to support commercial workloads
  • Hardware for decimal floating-point arithmetic

    • Important for business applications


And in conclusion …

  • Computer Architecture >> ISAs and RTL

  • CS152 is about interaction of hardware and software, and design of appropriate abstraction layers

  • Computer architecture is shaped by technology and applications

    • History provides lessons for the future
  • Computer Science at the crossroads from sequential to parallel computing

    • Salvation requires innovation in many fields, including computer architecture
  • Thursday is “Intro to Simics” section with Scott

  • Read Chapter 1, then Appendix B for next time!



Acknowledgements

  • These slides contain material developed and copyright by:

    • Arvind (MIT)
    • Krste Asanovic (MIT/UCB)
    • Joel Emer (Intel/MIT)
    • James Hoe (CMU)
    • John Kubiatowicz (UCB)
    • David Patterson (UCB)
  • MIT material derived from course 6.823

  • UCB material derived from course CS252



Yüklə 1,61 Mb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə