# P55C Micro-Architecture — The First Implementation of the MMX<sup>TM</sup> Technology

Michael Kagan
Intel Corporation
Israel Design Center

**August 20, 1996** 



#### **Outline**

- Micro Architecture Overview
  - Instruction Decode
  - Pairing
  - MMX™ Technology Execution Units
  - Pipeline
- Performance
- Summary

### Implementation Goals and Challenges

- Significant performance improvement of multimedia and communications applications
- No impact on device speed
- Maximize usage of existing microarchitecture
  - New instructions to decode
  - Unique execution units



### MMX<sup>TM</sup> Technology Implementation



MMX<sup>™</sup> Technology Added in Parallel to the Existing Integer and FP H/W



## Decoding MMX<sup>TM</sup> Technology Instructions

- 0F prefix becomes a mainstream opcode
  - All MMX<sup>™</sup> technology instructions start with 0F
- Length decoder extended to 4 bytes
  - Quadruple 0F decode bandwidth
- Capable of issuing two MMX technology instructions per cycle

Decoder Supports Full Bandwidth for New Instructions



## MMX<sup>TM</sup> Technology Execution Pipe

- MMX<sup>™</sup> technology instructions use the integer pipe
- After the Execute stage the MMX technology pipe continues to the Mex and WM stage
  - Multiply instructions continue in the MMX technology pipe to the M2, M3 and WMul stages





#### MMX<sup>™</sup> Technology Execution Units

 All MMX<sup>™</sup> technology Instructions can be issued every clock

| Operation               | # Of Units | Latency | Throughput | Pipes   |
|-------------------------|------------|---------|------------|---------|
| ALU                     | 2          | 1       | 1          | U and V |
| Multiplier              | 1          | 3       | 1          | U or V  |
| Shift/Pack/Unpack       | 1          | 1       | 1          | U or V  |
| Memory Access           | 1          | 1       | 1          | U       |
| Integer Register Access | ; 1        | 1       | 1          | U       |

Maximum Throughput for New Instructions
Most Instructions Can Be Paired



### MMX<sup>TM</sup> Technology Instructions Execution

**IDCT Inner Loop Execution** 



Utilize SIMD and Micro-Architecture Parallelism

To Achieve Performance



### P55C Multimedia Kernels Performance



### Summary

- P55C with MMX<sup>™</sup> technology will provide significant performance improvement for multimedia and communications applications
- Clean MMX technology implementation without sacrificing device speed
- P55C maximizes usage of existing Pentium<sup>®</sup> Processor micro-architecture
- Product introduction expected in Q1'97

Intel's MMX technology -For the next generation of multimedia and communications

