Computer Languages Pdf 186020

Partial capture of text on file.
                                                                                AssemblyLanguages                                                      AssemblyLanguageModel
                                                                                Onestepupfrommachine                                                           .
                                                                                                                                                               .
                     Assembly Languages                                         language                                                                       .
                            COMSW4995-02                                        Originally a more                                                              add r1,r2
                                                                                user-friendly way to program                                                   sub r2,r3
                        Prof. Stephen A. Edwards                                Nowmostlyacompiler target                                                      cmpr3,r4
                                Fall 2002                                       Model of computation:                                                  PC → bneI1             ALU ↔ Registers ↔ Memory
                           Columbia University                                  stored program computer                                                        sub r4,1
                    Department of Computer Science
                                                                                                                                                               I1:
                                                                                                                                                               jmp I3
                                                                                                                                                               .
                                                                                                                                                               .
                                                                                                                                                               .
         AssemblyLanguageInstructions                                           TypesofOpcodes                                                         Operands
         Built from two pieces:                                                 Arithmetic, logical                                                    Eachoperand taken from a particular addressing mode:
                                                                                 • add, sub, mult                                                      Examples:
                            add R1, R3, 3                                        • and, or
                                                                                 • Cmp                                                                  Register       add r1, r2, r3
                    Opcode                  Operands                            Memoryload/store                                                        Immediate      add r1, r2, 10
             Whattodowiththedata       Wheretogetthedata                         • ld, st                                                               Indirect       movr1, (r2)
                                                                                Control transfer                                                        Offset         movr1, 10(r3)
                                                                                 • jmp                                                                  PCRelative     beq 100
                                                                                 • bne                                                                 Reﬂect processor data pathways
                                                                                Complex
                                                                                 • movs
         TypesofAssemblyLanguages                                               CISCAssemblyLanguage                                                   RISCAssemblyLanguage
         Assembly language closely tied to processor architecture               Developed when people wrote assembly language                          Response to growing use of compilers
         At least four main types:                                              Complicated, often specialized instructions with many                  Easier-to-target, uniform instruction sets
         CISC: Complex Instruction-Set Computer                                 effects                                                                “Make the most common operations as fast as possible”
         RISC: Reduced Instruction-Set Computer                                 Examples from x86 architecture                                         Load-store architecture:
                                                                                 • String move
         DSP:Digital Signal Processor                                                                                                                   • Arithmetic only performed on registers
         VLIW: Very Long Instruction Word                                        • Procedure enter, leave
                                                                                Many, complicated addressing modes                                      • Memoryload/store instructions for memory-register
                                                                                                                                                          transfers
                                                                                Socomplicated, often executed by a little program                      Designed to be pipelined
                                                                                (microcode)
                                                                                Examples: Intel x86, 68000, PDP-11                                     Examples: SPARC, MIPS, HP-PA, PowerPC
         DSPAssemblyLanguage                                                           VLIWAssemblyLanguage                                                         Example: Euclid’s Algorithm
         Digital signal processors designed speciﬁcally for signal                     Response to growing desire for instruction-level                             int gcd(int m, int n)
         processing algorithms                                                         parallelism                                                                  {
         Lots of regular arithmetic on vectors                                         Using more transistors cheaper than running them faster                         int r;
         Often written by hand                                                         Manyparallel ALUs                                                               while ((r = m % n) != 0) {
                                                                                       Objective: keep them all busy all the time                                         m = n;
         Irregular architectures to save power, area                                                                                                                      n = r;
         Substantial instruction-level parallelism                                     Heavily pipelined                                                               }
                                                                                       Moreregular instruction set                                                     return n;
         Examples: TI 320, Motorola 56000, Analog Devices                              Very difﬁcult to program by hand                                             }
                                                                                       Looks like parallel RISC instructions
                                                                                       Examples: Itanium, TI 320C6000
         i386 Programmer’s Model                                                       Euclid on the i386                                                           Euclid on the i386
                                                                                         .file "euclid.c"            # Boilerplate                                     .file "euclid.c"
           31       0                       15   0                                       .version "01.01"                                                              .version "01.01"
               eax       Mostly                cs     Codesegment                      gcc2 compiled.:                                                              gcc2 compiled.:                     Stack Before Call
               ebx       General-              ds     Data segment                       .text                       # Executable                                      .text                                  n        8(%esp)
               ecx       Purpose-              ss     Stack segment                      .align 4                    # Start on 16-byte boundary                       .align 4                               m        4(%esp)
                                                                                         .globl gcd                  # Make “gcd” linker-visible                       .globl gcd                 %esp→ R.A.           0(%esp)
               edx       Registers             es     Extra segment                      .type gcd,@function                                                           .type gcd,@function
               esi       Source index          fs     Data segment                     gcd:                                                                         gcd:                                 Stack After Entry
                                               gs     Data segment                         pushl %ebp                                                                   pushl %ebp                            n       12(%ebp)
               edi       Destination index                                                 movl %esp,%ebp                                                               movl %esp,%ebp                        m        8(%ebp)
               ebp       Basepointer                                                       pushl %ebx                                                                   pushl %ebx                          R. A.      4(%ebp)
               esp       Stack pointer                                                     movl 8(%ebp),%eax                                                            movl 8(%ebp),%eax %ebp→ oldebp                 0(%ebp)
                                                                                           movl 12(%ebp),%ecx                                                           movl 12(%ebp),%ecx %esp→ oldebx              −4(%ebp)
             eﬂags       Status word                                                       jmp .L6                                                                      jmp .L6
                                                                                       .p2align 4,,7                                                                .p2align 4,,7
               eip       Instruction Pointer
         Euclid in the i386                                                            Euclid on the i386                                                           SPARCProgrammer’sModel
            jmp .L6              # Jump to local label .L6                               jmp .L6                                                                      31  0                       31 0
         .p2align 4,,7           # Skip ≤ 7 bytes to a multiple of 16                  .p2align 4,,7
         .L4:                                                                          .L4:                                                                             r0     Always 0           r24/i0   Input Registers
            movl %ecx,%eax                                                               movl %ecx,%eax #m=n                                                                                         .
                                                                                                                                                                                                     .
            movl %ebx,%ecx                                                               movl %ebx,%ecx #n=r                                                            r1     Global Registers      .
                                                                                                                                                                        .
                                                                                                                                                                        .
         .L6:                                                                          .L6:                                                                             .                         r30/i6   FramePointer
            cltd                  # Sign-extend eax to edx:eax                           cltd                                                                           r7                        r31/i7   Return Address
            idivl %ecx            # Compute edx:eax / ecx                                idivl %ecx                                                                   r8/o0    Output Registers
            movl %edx,%ebx                                                               movl %edx,%ebx                                                                 .
                                                                                                                                                                        .
            testl %edx,%edx                                                              testl %edx,%edx #ANDofedxandedx                                                .                         PSW      Status Word
            jne .L4                                                                      jne .L4                # branch if edx was 6= 0                              r14/o6   Stack Pointer       PC      Program Counter
            movl %ecx,%eax                                                               movl %ecx,%eax #Returnn                                                      r15/o7                       nPC     Next PC
            movl -4(%ebp),%ebx                                                           movl -4(%ebp),%ebx                                                           r16/l0   Local Registers
            leave                                                                        leave                  # Move ebp to esp, pop ebp                              .
                                                                                                                                                                        .
            ret                                                                          ret                    # Pop return address and branch                         .
                                                                                                                                                                      r23/l7
        SPARCRegisterWindows                                                  Euclid on the SPARC                                                   Euclid on the SPARC
                                                  r8/o0                          .file     "euclid.c"      # Boilerplate                                mov %i0, %o1
                                                    .                         gcc2 compiled.:                                                           b      .LL3
                                                    .
                                                    .                                                                                                   mov %i1, %i0
        Theoutput registers of                    r15/o7                         .global .rem              # make .rem linker-visible
                                                  r16/l0                                                                                            .LL5:
                                                    .                            .section ".text"          # Executable code
                                                    .
        the calling procedure                       .                            .align 4                                                               mov    %o0, %i0    # n = r
        becometheinputs to                        r23/l7
                                          r8/o0   r24/i0                         .global gcd               # make gcd linker-visible                .LL3:
                                           .        .
                                           .        .
        the called procedure               .        .                            .type gcd, #function                                                   mov %o1, %o0 #Computetheremainderof
                                         r15/o7   r31/i7                                                                                                call .rem, 0       # m / n, result in o0
                                         r16/l0                                  .proc     04
                                           .
        Theglobal registers                .                                                                                                            mov %i0, %o1
                                           .                                  gcd:
        remain unchanged                 r23/l7                                  save %sp, -112, %sp       # Next window, move SP
                                 r8/o0   r24/i0
                                   .       .
                                   .       .                                                                                                            cmp %o0, 0
        Thelocal registers are     .       .
                                 r15/o7  r31/i7                                  mov   %i0, %o1            # Move m into o1                             bne .LL5
                                 r16/l0
        not visible across         .                                                                                                                    mov %i0, %o1 #m=n(alwaysexecuted)
                                   .
                                   .                                             b     .LL3                # Unconditional branch                       ret                # Return (actually jmp i7 + 8)
        procedures               r23/l7                                          mov   %i1, %i0            # Move n into i0
                                 r24/i0                                                                                                                 restore            # Restore previous window
                                   .
                                   .
                                   .
                                 r31/i7
        Digital Signal Processor Apps.                                        EmbeddedProcessor                                                     Conventional DSP Architecture
                                                                              Requirements
        Low-cost embedded systems                                                                                                                   Harvard architecture
          • Modems, cellular telephones, disk drives, printers                Inexpensive with small area and volume                                  • Separate data memory/bus and program memory/bus
        High-throughput applications                                          Deterministic interrupt service routine latency                         • Three reads and one or two writes per instruction cycle
                                                                              Lowpower:≈50mW(TMS320C54xuses0.36µA/MIPS)                             Deterministic interrupt service routine latency
          • Halftoning, base stations, 3-D sonar, tomography                                                                                        Multiply-accumulate in single instruction cycle
        PCbasedmultimedia                                                                                                                           Special addressing modes supported in hardware
                                                                                                                                                      • Modulo addressing for circular buffers for FIR ﬁlters
          • Compression/decompression of audio, graphics, video                                                                                       • Bit-reversed addressing for fast Fourier transforms
                                                                                                                                                    Instructions to keep the pipeline (3-4 stages) full
                                                                                                                                                      • Zero-overhead looping (one pipeline ﬂush to set up)
                                                                                                                                                      • Delayed branches
        Conventional DSPs                                                     Conventional DSPs                                                     Example
                         Fixed-Point         Floating-Point                   Market share: 95% ﬁxed-point, 5% ﬂoating-point                        Finite Impulse Response ﬁlter (FIR)
        Cost/Unit        $5–$79              $5–$381                          Eachprocessor comes in dozens of conﬁgurations                        Canbeusedforlowpass, highpass, bandpass, etc.
        Architecture     Accumulator         load-store                         • Data and program memory size                                      Basic DSP operation
        Registers        2–4 data, 8 address 8–16 data, 8–16 address                                                                                For each sample, computes
        Data Words       16 or 24 bit        32 bit                             • Peripherals: A/D, D/A, serial, parallel ports, timers
        Chip Memory      2–64Kdata+program   8–64Kdata+program                Drawbacks                                                                                       k
        Address Space    16–128K data        16M–4Gdata                         • No byte addressing (needed for image and video)                                      yn = Xaixn+i
                         16–64K program      16M–4Gprogram                                                                                                                   i=0
        Compilers        BadC                Better C, C++                      • Limited on-chip memory
        Examples         TI TMS320C5x        TI TMS320C3x                       • Limited addressable memory on most ﬁxed-point                     where
                         Motorola 56000      Analog Devices SHARC                 DSPs                                                              a0,...,ak are ﬁlter coffecients,
                                                                                • Non-standard C extensions to support ﬁxed-point data              xn is the nth input sample, yn is the nth output sample.
         56000 Programmer’s Model                                                  56001 MemorySpaces                                                       56001 Address Generation
           55 4847 x1 2423 x00   Source      15    0   Program Counter             Three memory regions, each 64K:                                          Addresses come from pointer register r0 ...r7
                  y1      y0     Registers             Status Register              • 24-bit Program memory                                                 Offset registers n0 ...n7 can be added to pointer
           a2     a1      a0     Accumulator           Loop Address
           b2     b1      b0     Accumulator           Loop Count                   • 24-bit X data memory                                                  Modiﬁer registers cause the address to wrap around
                                                15     PCStack
          15    0 15   0 15   0                  .
                                                 .                                  •
             r7     n7    m7                     .                                     24-bit Y data memory                                                 Zero modiﬁer causes reverse-carry arithmetic
             .      .       .                    0
             .      .       .
             .      .       .                   15     SRStack                     Idea: enable simultaneous access of program, sample,                       Address     Notation    Next value of r0
             r4     n4    m4 Address             .
                                                 .                                                                                                            r0          (r0)        r0
             r3     n3    m3 Registers           .                                 and coefﬁcient memory
             .      .       .
             .      .       .                    0
             .      .       .                                                                                                                                 r0 + n0     (r0+n0)     r0
             r0     n0    m0                           Stack pointer               Three on-chip memory spaces can be used this way                           r0          (r0)+       (r0 + 1) mod m0
                                                                                   Oneoff-chip memory pathway connected to all three                          r0 - 1      -(r0)       r0 - 1 mod m0
                                                                                   memoryspaces                                                               r0          (r0)-       (r0 - 1) mod m0
                                                                                                                                                              r0          (r0)+n0     (r0 + n0) mod m0
                                                                                   Only one off-chip access per cycle maximum                                 r0          (r0)-n0     (r0 - n0) mod m0
         FIR Filter in 56001                                                       FIR Filter in 56001                                                      TI TMS320C6000 VLIWDSP
         n         equ 20        # Deﬁne symbolic constants                        movep y:input, x:(r0) #Loadsampleintomemory                              Eight instruction units dispatched by one very long
         start     equ $40                                                                 # Clear accumulator A                                            instruction word
         samples equ $0                                                                               # Load a sample into x0
         coeffs    equ $0                                                                                               # Load a coefﬁcient                 Designed for DSP applications
         input     equ $ffe0 #Memory-mappedI/O                                     clr     a          x:(r0)+, x0 y:(r4)+, y0
         output    equ $ffe1                                                                                                                                Orthogonal instruction set
                                                                                   rep     #n-1       # Repeat next instruction n-1 times                   Big, uniform register ﬁle (16 32-bit registers)
                   org p:start #Locateinprog. memory                                       # a = x0 × y0
                   move #samples, r0 #Pointers to samples                                             # Next sample                                         Better compiler target than 56001
                   move #coeffs, r4        # and coefﬁcients                                                            # Next coefﬁcient
                   move #n-1, m0         # Prepare circular buffer                 mac     x0,y0,a    x:(r0)+, x0       y:(r4)+, y0                         Deeply pipelined (up to 15 levels)
                   move m0, m4                                                                                                                              Complicated, but more regular, datapath
                                                                                   macr    x0,y0,a    (r0)-
                                                                                   movep a, y:output #Writeoutput sample
         Pipelining on the C6                                                      FIRinOne’C6AssemblyInstruction                                            Peripherals
         Oneinstruction issued per clock cycle                                                  Load a halfword (16 bits)                                   Often the whole point of the system
         Very deep pipeline                                                                            Dothis on unit D1                                    Memory-mapped I/O
                                                                                   FIRLOOP:
           • 4 fetch cycles                                                                  LDH .D1 *A1++, A2 ;Fetchnextsample                               • Magical memory locations that make something
                                                                                   ||        LDH .D2     *B1++, B2      ; Fetch next coeff.                     happen or change on their own
           • 2 decode cycles                                                       || [B0] SUB .L2       B0, 1, B0      ; Decrement count
           • 1-10 execute cycles                                                   || [B0] B      .S2    FIRLOOP        ; Branch if non-zero                Typical meanings:
                                                                                   ||        MPY .M1X A2, B2, A3 ;Sample×Coeff.
         Branch in pipeline disables interrupts                                    ||        ADD .L1     A4, A3, A4 ;Accumulate result                        • Conﬁguration (write)
         Conditional instructions avoid branch-induced stalls                                            Usethecross path                                     • Status (read)
         Nohardwaretoprotect against hazards                                               Predicated instruction (only if B0 non-zero)                       • Address/Data (access more peripheral state)
           • Assembler or compiler’s responsibility                                  Runtheseinstruction in parallel
The words contained in this file might help you see if this file matches what you are looking for:

...Assemblylanguages assemblylanguagemodel onestepupfrommachine assembly languages language comsw originally a more add r user friendly way to program sub prof stephen edwards nowmostlyacompiler target cmpr fall model of computation pc bnei alu registers memory columbia university stored computer department science i jmp assemblylanguageinstructions typesofopcodes operands built from two pieces arithmetic logical eachoperand taken particular addressing mode mult examples and or cmp register opcode memoryload store immediate whattodowiththedata wheretogetthedata ld st indirect movr control transfer offset pcrelative beq bne reect processor data pathways complex movs typesofassemblylanguages ciscassemblylanguage riscassemblylanguage closely tied architecture developed when people wrote response growing use compilers at least four main types complicated often specialized instructions with many easier uniform instruction sets cisc set effects make the most common operations as fast possible r...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area