Article 2742 of dec.notes.technology.dechips: Title: 21164PC announced? Reply Title: 21164PC in Byte Below is the final draft that I sent to Byte. Some editting was done by Byte, but it should be close to the published version. Pete Alpha 21164PC -- Leadership Performance for Windows NT Desktop Systems By Pete Bannon Smaller, cheaper and faster are the watchwords for semiconductor manufacturers today as they scramble to satisfy the enormous appetites of multimedia, CAD, and data manipulation applications on the desktop. Corporations and small businesses are using desktop systems for video conferencing, voice synthesis, and enterpise-wide data access. Home users are surfing the Web, running sophisticated video games and creating home movies on their PCs. As each breakthrough drives the imagination toward new horizons, this trend is not likely to abate anytime soon. Designed with these applications in mind, Digital Semiconductor's new Alpha 21164PC microprocessor delivers more CPU cycles and greater data bandwidth in a smaller package than any other microprocessor on the market. For example, the 533-MHz 21164PC has a smaller die size than the 200-MHz Pentium with MMX and provides significantly higher performance. The 21164PC outperforms the Pentium chip by more than two times in SPECint95 performance and by more than three times in SPECfp95 performance. The 21164PC, which was co-designed by Digital Semiconductor and Mitsubishi, supports an astonishing 2.1 BIPS (2133 MIPS) and 1066 MFLOPS. The estimated performance of the 21164PC configured with 2MB of external cache and 125ns main memory is 14 SPECint95 and 17 SPECfp95. These characteristics and the 21164PC's price points make an Alpha processor solution the clear leader for Windows NT PCs. Full Windows compatibility provides an additional 21164PC edge in the desktop market. In addition to a large number of native Windows NT applications, DIGITAL FX!32, a breakthough software translation technology, gives Alpha system users access to the full suite of 32-bit x86 Windows and Windows NT applications, running with high performance. Further, as the first Alpha processor to implement Motion Video Instructions (MVI), the 21164PC dramatically increases the Alpha processor advantage over competing products in motion video applications. For example, the 21164PC supports full-frame-rate DVD playback and high-quality video conferencing in software, eliminating the need for dedicated multimedia hardware and reducing overall system cost. Innovation, Small Die Size Target PC Market A full implementation of the Alpha architecture, the 21164PC leverages the design of Digital Semiconductor's Alpha 21164, a processor that has maintained performance leadership in the industry since its introduction. The 21164PC, depicted in Figure 1, draws from this technology and incorporates innovation and a smaller die size to achieve its advanced design. Implemented in Digital Semiconductor's 0.35-micron CMOS process, the 21164PC features a die size of 8.5 mm by 16.2 mm and contains 3.5 million transistors. This small die size (a 30% reduction from previous Alpha processor implementations) enables significant manufacturing cost savings, which translate directly to more affordable PCs for a broader market. The Alpha Architecture The Alpha architecture is a 64-bit load and store RISC architecture that is designed with particular emphasis on speed, multiple instruction issue, and multiple processors and focuses on uncompromised support for many operating systems, including Windows NT, Digital UNIX, Linux, and Open VMS. All registers are 64 bits long and all operations are performed between these registers. Alpha instructions are 32 bits long and memory operations are either load or store operations of data that is 8, 16, 32, or 64 bits in length. The 21164PC takes full advantage of the Alpha architecture in a quad-issue implementation featuring a 7-stage integer pipeline and 9-stage floating point pipeline. The 21164PC has a large 16KB instruction cache (Icache) and features a bandwidth of 8000MB per second to the instruction issue unit. This exceptional data transfer capacity plus an aggressive instruction pre-fetch scheme keep the chip's pipelines full. The 21164PC's Icache pre-fetches 96 bytes ahead of the current program counter, providing significant performance improvements in long code sequences where instructions can be fetched 300% faster than is possible without pre-fetching. Streamlined Instruction Issue and Execution The 21164PC's instruction unit is comprised of an instruction buffer, slotter, and issue unit. The simple instruction issue design maximizes the 21164PC's clock frequency with little impact on the number of instructions that can be issued in a cycle. The microprocessor's instruction buffer holds two sets of four instructions, facilitating the chip's quad-issue operation. The buffer optimizes the flow of instructions into the slotter unit by removing bubbles from the pipeline that are caused by taken branches. The slotter attempts to assign four instructions to the pipelines each cycle and refills when all the instructions have been assigned. The issue unit allows the instructions to execute after assuring the availability of the required system resources. The integer execution unit contains a register file and several functional units in four stages of two parallel pipelines. The pipelines contain differing sets of units with the 64-bit adder, logical and load units being common to both pipelines. The 21164PC's instructions execute in one cycle with the exception of loads and conditional move instruction, which require two cycles. In addition, the 21164PC incorporates a special hardware feature that allows the common code sequence, COMPARE followed by DEPENDENT BRANCH, to execute in one cycle instead of two, thereby streamlining application performance. The new MVI instructions -- PERR, MAX/MIN and PACK/UNPACK -- are implemented in the integer unit, saving die space and reducing cost. This efficient implementation delivers an impressive 400% improvement in MPEG-2 compression for the very low cost of 0.6% of the 21164PC's area. This design is possible because the 21164PC's 31 64-bit integer registers provide sufficient storage to support the chip's issue bandwidth of 533 million MVI instructions per second concurrently with 533 million additional integer instructions per second. In addition, the supporting instructions used by MVI, including compares, adds, shifts, loads, and stores, already exist in the integer unit, eliminating the need for additional instruction logic on the 21164PC chip. The 21164PC's floating point unit allows for exceptional performance in floating point-intensive applications such as 3-D graphics on the desktop. The unit is made up of two 64-bit execution units -- an add pipeline that executes all floating point instructions except multiply, and a multiply pipeline. Both units are fully pipelined and have a latency of four cycles. To maximize performance further, the floating point unit incorporates two dedicated floating point load data pipelines that allow floating point load and store instructions to be executed in parallel with floating point operates. Memory Unit Delivers High Throughput The 21164PC's memory unit, which features very high data bandwidth, maximizes operational efficiency and CPU utilization. The 8KB data cache (Dcache)-- a dual-ported, fully pipelined, non- blocking cache -- allows the 21164PC to move rapidly through programs that process large amounts of data. Because the Dcache is non-blocking (up to 21 loads can miss), the processor can continue to operate uninterrupted when cache misses occur. In addition, the design interleaves cache fills from memory with processor operations. These design characteristics give the 21164PC a significant advantage compared with other processor designs. For example, the peak data bandwidth of the 21164PC operating at 533 MHz is ten times higher than the peak bandwidth of a Pentium operating at 200 MHz. Further, due to the 21164PC's robust write buffer, more things can happen simultaneously in memory. The write buffer has six 32- byte entries, with each entry providing an opportunity to collapse into a single transaction multiple writes to the same address. The 21164PC's L2 cache controller helps maximize application performance by streamlining L2 cache accesses. The cache controller, which is also non-blocking, does this by ordering requests to the L2 cache to achieve an optimal balance between bandwidth utilization and access latency. PC-Compatible Motherboard Figure 2 depicts the block diagram of the AlphaPC 164SX motherboard, a sample design incorporating the 21164PC. Featuring an ATX form factor and outstanding price/performance, the motherboard is ideally suited for Windows NT desktop systems. The design database is available, at no charge, from Digital Semiconductor. The AlphaPC 164SX motherboard satisfies all ATX requirements for hole placement, component spacing and component height. The six-layer AlphaPC 164SX module can be installed in any ATX enclosure and uses standard ATX power supplies. The AlphaPC 164SX also offers extensive flexibility, allowing PC manufacturers to configure systems that satisfy a broad range of applications, cost effectively. The AlphaPC 164SX's 413-pin ZIF socket accepts 400-MHz and 533-MHz 21164PCs, giving PC manufacturers two high-performance processor choices. The motherboard also accepts L2 caches sized from 512KB to 4MB and operating at speeds of 66MHz to 133MHz, offering a spectrum of data handling capabilities. The 21174 Core Logic Chipset that is configured on the motherboard provides high-speed access to memory and PCI peripheral devices. The 21174 features support for 16Mbit or 64Mbit SDRAM memory in configurations from 32MB to 512MB. Using the chipset's PCI interface, the motherboard accommodates a full range of I/O device configurations. The AlphaPC 164SX contains two 64-bit and two 32-bit PCI slots. Operating at 33 MHz, the 64-bit PCI interfaces provide up to 260MB per second of I/O bandwidth, satisfying the high-performance demands of I/O devices such as graphics, ATMs and RAID. The 21164PC is also fully compatible with existing chipsets, such as Digital Semiconductor's 21172 for the Alpha 21164, allowing manufacturers to design-in these products. To enable easy, cost-effective configuration of traditional ISA devices, the AlphaPC 164SX motherboard incorporates a Cypris CY82C693 PCI to ISA bridge and two ISA slots. This part provides on-board USB, IDE and keyboard/mouse interfaces. Availability Alpha 21164PC microprocessors will be available from Digital Semiconductor in Q2 of this year. Two versions of the 21164PC -- 400-MHz and 533-MHz -- will be offered for approximately $1/MHz. For more information regarding these and other Digital Semiconductor Alpha products, contact your local semiconductor distributor or call the Digital Semiconductor Information Line: United States and Canada 1-800-332-2717; outside North America +1-508-628-4760. Or visit the Digital Semiconductor Alpha Web site: www.alpha.digital.com. ******************************************************************** Biography for Peter Bannon Peter J. Bannon Consulting Engineer Digital Semiconductor Digital Equipment Corporation Peter J. Bannon is a hardware consulting engineer with Digital Equipment Corporation. He has participated in the design and/or verification of several CISC and RISC microprocessor chips and was a member of the Alpha 21164 architecture team. He joined Digital in 1984 after receiving a Bachelor of Science in computer system design from the University of Massachusetts. Bannon holds three patents for VAX CPU designs and has filed six patent applications for the Alpha 21164 microprocessor. ###