Released in 1991 and marketed as a floating-point coprocessor for the Intel 486 SX, the Intel 80487 was actually a fully featured Intel 486 DX with a slightly different pinout. Intel added an unconnected 169th pin as a mechanical key for the 487 Socket. Another pin known as “MP#” (Math Present) was used to entirely disable the original 486 SX by triggering its “back-off” (from bus) mode. Being almost 100% compatible with the 486 DX, supporting the 486 SX with the Universal Chip Analyzer was trivial. I bought many Socket 168 socket and I just drilled a 1 mm hole and it worked immediately.
According to the 487’s datasheet, it was rated at 25 MHz maximum, but it also run fine at 33 MHz. It is possible to detect a B0-step Intel 487 by its unique CPUID (0x421). AFAIK, all retail 487s are B0-Step. A0-step are Engineering Sample only (with an unknown CPUID, maybe 0x420).
While testing the 487, I noticed a strange behavior that will deserve more investigation later. It seems the 487SX needs a longer reset period to initialize properly compared to a standard 486. Technically, it makes sense: this additional delay might be needed to let the original 486SX disable itself and back off properly from the bus (before the 487SX takes full control).
Early in the development process, “UCA” meant “Universal CPU Analyzer”. Then I thought it could also be used to test non-CPU like FPUs, Bitslicers or RAM chips and I finally changed the name for “Universal Chip Analyzer”. The 8087 FPU is the first supported IC that’s not a CPU or MCU. Released in 1980, it’s a much more complex chip than its companion 8086. While the later is built with 29.000 transistors, the 8087 integrates 50% more of them for a total of roughly 45.000! It handles various floating-point arithmetic operations (additions, multiplication, square root, etc.) as well as transcendental functions from exponential to trigonometric calculations. The 8087 was the very first FPU to implement the draft of what was to become the initial IEEE 754 standard (circa 1985).
Building an adapter for the 8087 starting with the iAPX-86 code already done was quite easy. Emulating the CPU with the FPGA was technically feasible, but this would have limited the complexity of the x86/87 ASM code able to run. Fortunately, 8086s are still widely available for cheap and every collectors have spares.
The UCA 8087 FPU Adapter requires any 8086 with a rated speed of 10 MHz of more (the fastest 8087 is clocked at 10 MHz). While the standard 8086/8088 UCA Shield configures the CPU in the simplified “MIN” mode, this adapter requires the “MAX” mode with additional bus decoding stages. The original Intel 8288 Bus Controller had been translated in Verilog HDL and implemented in the FPGA. After some tuning, everything was running properly :
An option to automatically subtract the power consumption of the 8086 (to show only the one from the 8087) will be added later . Target frequencies are 4, 6, 8 and 10 MHz.
The Intel 80186 is one of the lesser known early x86 CPUs. In February 1982, 4 years after the 8086’s introduction, Intel released its successor, the 80286 (or “286”). Simultaneously, Intel also quietly released the 80186 to target different markets. While the 286 is a generic microprocessor like the 8086 was, but based on a new microarchitecture, the 186 could be considered as the first x86-based microcontroller. The difference between a microprocessor (CPU) and a microcontroller (MCU) is the level of integration inside the chip. A microprocessor requires a lot of support components (memory controller, bus arbitration logic, etc.) and is primarily used to build computers. A microcontroller integrates many of these components along with a (less powerful) microprocessor and is used for embedded purposes.
The 80186 integrates an enhanced 8086 CPU with a 16-bit bus and many support components: a clock generator, various controllers (DMA, Interrupt, bus, etc.), programmable timers, wait-state generator, chip-select logic, and even more. All these features greatly reduce the overall component count and the complexity of the board. Here is the original 186’s block diagram:
The 80186 is basically a hybrid concept that has been used in embedded applications as a microcontroller, but also as a CPU to build cheap 8086-class computers. For example, it was at the heart of the Tandy 2000 PC released in 1983, but also buried inside the Intel 14.4EX Modem to compute complex algorithms. They later used the 80188, an even cheaper offshoot almost identical to the 80186 but based on an external 8-bit bus (like the 8088). As 8086-class CPUs, both the 80186 and the 80188 can be linked to the 8087 FPU, but this association was almost never found in real-world products. Original 80188/80186s were built on Intel’s HMOS 3 µm process at 6 MHz, 8 MHz and 10 MHz. They came in 3 different packages: PGA-68, leadless ceramic (CLCC-68) and leadless plastic (PLCC-68). The Universal Chip Analyzer is now able to test and run code on all these CPUs:
In 1987, Intel released the 80C188 and 80C186, built on Intel’s 1.5 µm CMOS process. Clock speeds reached 16 MHz and power consumption was vastly reduced. Some features were also added: a power-save mode, a refresh controller to handle RAM refresh cycle without external components and a FPU interface to support the newly released 80C187 (support for the old 8087 was dropped). The uncommon 80C187 is essentially a 80387 repackaged into a DIP-40 or PLCC-44 package. The UCA is able to test and detect 80C186 and 80C188 in various packages:
In 1991 (the 486 was available at that date), Intel released the improved “XL” variant. Thanks to the CMOS 1 µm process, the 80C186XL and 80C188XL were able to reach up to 25 MHz at a lower power consumption. They now use a static design (able to be clocked down to DC for even more power reduction) while the 80C18x were based on a dynamic design (with a minimum clock frequency needed to retain internal register values). The UCA can also test all members of the “XL” family and even detect their stepping (A-/B- or C-step) :
The maximum frequency for the UCA is 20 MHz because 186/188 requires a clock doubled input and I wanted to avoid an external PLL to keep cost low (the 186 adapter is a simple 2-layer PCB). Adding support for it to reach 25 MHz (or much more) is trivial but that will almost double the BOM price for the adapter (from ~$10 to ~$20).
Intel also released the 80C186EB (5V) and 80L186EV (3V) in 1990 and the 80C186/188EA & 80C186/188EC one year later (also available in ‘L’ version). The 80C186EA in PLCC-68 package is very close to the 80C186XL. The main differences are some more advanced power saving modes and TTL-level inputs compatibility (while the XL requires CMOS-level inputs). I’m still looking for one, but they should work fine on the UCA. The “EB” line adds an improved chip-select unit, two UART for serial communication and 16 GPIO. While electrically able to run on the UCA, they come in a bigger PLCC-84 and PGA-88 packages and don’t fit physically. The “EC” line adds even more GPIOs and is only available in SMD QFP-100 packaging. Designing an adapter for EB and EC 186/188s is not planned at this time.
Stay tuned for another big UCA milestone in the next few days!
While developing the 486 Adapter for the Universal Chip Analyzer, I was worrying about how to distinguish between early CPUs from AMD and Intel (the ones without CPUID instruction support). There is no way to distinguish them because they’re basically the exact same chip: same microcode, same architecture, same power consumption, etc. AMD used the Intel’s die for its whole early 486 line and only the external packaging was different. Thus, no BIOS nor any software detection tool can distinguish between an early AMD Am486DX2-66NV8T and an Intel 486 DX2-66. Both even share the same ID set in EDX register at boot.
I carefully read the datasheets and finally found a small difference between them. It’s located in the JTAG controller, embedded in all AMD 486s and Intel 486s starting with the DX-50. The JTAG controller is used as an internal test tool since the late 80s, standardized in 1990 as IEEE 1149.1 (“Standard Test Access Port and Boundary-Scan Architecture“). It’s now an industry-standard feature present in all complex ICs for debugging purposes. JTAG was commonly used in the 90s to remotely sense the state of all hardware pins with the ability to toggle them individually between 0, 1 and High-Z (floating).
The JTAG controller is generally totally isolated from the CPU: you can’t access any of the internal test features nor test registers from the code running on the CPU. (Some years ago, Intel added a feature to access JTAG from USB, which caused some serious vulnerabilities). Back in the 90s, JTAG access had to be done from dedicated CPU pins called the TAP (Test Access Port). The TAP uses 3 input pins (TMS for Chip Select, TCK for Clock and TDI for Data Input) and one output pin (TDO for Data Output). JTAG has been designed to daisy chain many ICs (boundary-scan).
The basic early JTAG implementation in 486s supports 5 instructions:
(0000b) EXTEST – Arbitrary setting of pins on the CPU to a given state (0, 1, Z)
(0001b) SAMPLE – Poll and report the status of all CPU pins.
(0010b) IDCODE – Used for chip identification
(1000b) RUNBIST – Launch the internal self-test, built-in on all CPUs since the 386s
(1111b) BYPASS – Connect TDI with TDO to bypass the chip (when talking with another IC in the chain)
According to Intel’s datasheet, the IDCODE instruction reports a 32-bit register with the following format:
The Manufacturer Identity is a 11-bit value linked to the chip manufacturer: 0x09 for Intel and 0x01 for AMD. That’s how you can distinguish between an Intel and AMD 486. JTAG is not available on Cyrix, TI and UMC 486s, but these CPUs don’t use the Intel Microcode and they have other identification methods. Accessing the IDCODE register to distinguish AMD and Intel 486s requires specific hardware. Due to limitations in I/O lines available from the FPGA and the tiny space available on the PCB, I chose to add an extremely tiny ATMega328P-MN (0.5 mm pitch!) on the 486 Adapter to access the JTAG port:
The code for bit-banging JTAG commands and communicating with the JTAG controller was quickly written, thanks to this blog that published a nice proof-of-concept many years ago. I then added the link between the FPGA and the outside world to grab the JTAG data from the Windows companion tool. I took the opportunity to rewrite almost all the communication stack between the Universal Chip Analyzer, its integrated MCU and the FPGA. Let’s try with some real-world 486s!
* AMD Am486DX2-66NV8T
The JTAG IDCODE register reported (0x00432003) strictly follows Intel’s datasheet:
Bit = 1 (JTAG constant)
Bit[11:1] = 0x01 (AMD’s Manufacturer ID)
Bit[27:12] = 0x0432 (Part Number = CPUID Family/Model/Revision)
Bit[31:28] = 0 (Revision not set)
As expected, the Part Number filed by AMD is the same as the value reported in the DX register just after boot. All Am486s I tested follow this scheme. I noted that the value reported on the JTAG IDCODE register changes with features activated (2x or 3x multiplier, WT or WB cache mode) just like the CPUID value.
* Intel 486DX2-66
Here is the most interesting part. For some reason, Intel does not follow its own public datasheet on most of its CPUs. Many Intel’s 486-era datasheets show the JTAG bit order as previously described, but the real value returned by many CPUs I tested often reports a totally different organization (only described properly on a printed Intel Datasheet I own).
The raw JTAG IDCODE register value reported on an early i486 DX2-66 (SX626) is 0x00432013 as expected, but a late one (SX955) returns another encoding: 0x40285013. It decodes as follows:
The Model Code reported in the 5-bit field in bits 16:12 is different than the 4-bit “Model” code read in DX at reset. Here is what I noted:
0x01 = 486 DX
0x02 = 486 SX
0x05 = 486 SX2 or DX2
0x07 = 486 DX2 w/ WT Cache
0x08 = 486 DX4
Support for JTAG is definitely an interesting feature to dive deeper in the 486 architecture. As for today and as far I know, the Universal Chip Analyzer is the only hardware or software tool to distinguish between an Am486 and an Intel 486.
Another milestone – albeit not the hardest one – has been reached! The 80286 is the 2nd gen 16-bit x86 CPU introduced by Intel in 1982. The most important improvement was the use of a separated data and address bus. Its predecessor, the famous Intel 8086, used the same pins to send address and then data. The lack of that slow time-multiplexed bus on the 286 allowed a major performance boost, sometimes more than 100% at a similar clock speed. The microarchitecture also evolved with a more advanced (dedicated) address calculation unit and a faster multiplier. The 80286 was also able to support up to 16 MB of RAM, thanks to its 24-bit address bus.
The 80286 also introduced the protected mode, designed to allow much more advanced memory management, with the ability to build multi-user systems using multitasking applications. Unfortunately, due to several limitations in that first implementation, along with several hardware errata found in earlier stepping, protected mode wasn’t really used by software developers on the 286. Intel only solved all these issues with the 80386. The Intel 80286 was initially released at 4, 6 and 8 MHz on nMOS 1.5 µm process. Later released reached 12.5 MHz in 1 µm CMOS process. Several other companies produced CPU fully based on Intel’s 286 microcode like AMD, Siemens and Harris, with speed up to 25 MHz!
Three common 68-pin packages were used for the vast majority of the 80286s ever produced: the original ceramic PGA, a leadless LCC (also ceramic) and a plastic PLCC. On the picture above, you can see an AMD R80286-8 (LCC) and a Harris CS80C286-25 (PLCC). The Universal Chip Analyzer is able to test all three packages just by plugging the related socket on the PGA DIP Socket. Frequencies available (by DIP Switches or software) are 4, 8, 10, 12.5, 16 and 20 MHz.
Why not 25 MHz? Because the 80286 requires a clock-doubled input and feeding a 50 MHz clock to get the 25 MHz core frequency would have required an external PLL. Not a big deal, but there is only one rare 286-class CPU that supports this frequency (the Harris/Intersil CS80C286-25 pictured above) and its timings is not fully compliant with the 286 specifications. Designing a special UCA adapter just for this chip is trivial, but quite useless because the 286 Adapter is already able to test it at 20 MHz. Speaking about “high” frequencies, using the right socket is crucial
The UCA 286 Adapter is fitted with a high-quality DIP Socket. Directly plugging a PGA 286 CPU is possible but not convenient for testing multiple CPUs in a row. I was able to secure some ZIF Sockets for 68-pin PGA like the blue one pictured here (from AMP) and also some 3M LCC sockets complete with top cap. About PLCCs, I first tried some cheap socket from eBay. That was a disaster: contact pins were too weak and bent after 2-3 insertions. Worst, the maximum frequency allowed was 8-10 MHz. Replacing these crappy sockets with other ones from Foxconn or 3M solved all the issues. I also bought some awesome Yamaichi test Socket for PLCC (on the right), but unfortunately, they use a specific pinout. As the 286 Adapter uses a simple 2-layer PCB, I will consider designing a specific PCB just for them.
With the hardware finished, I’ll later tune the software-testing code to see if I can detect various stepping, and maybe also the manufacturer.
Adding support for Cyrix & TI 486s was supposed to be a matter of hours. It finally took almost one month and gave me many headaches. I almost burned everything to the ground several times in rage, begged for help from FPGA’s gurus who told me what I’m trying to achieve was like squaring the circle, but I did not give up. Let’s try to explain why it was so hard.
— always(@TLDR; Technical stuff) —
FPGAs are synchronous beasts used to create finite states machines: almost everything inside a FPGA is synchronized to a clock signal. Each time the clock is ticking, the HDL code analyzes inputs and sets a pre-defined state (that itself defines registers, outputs, the next state, …). To add support for a CPU, you must read the datasheet and write some HDL code that will provide the correct outputs (from the FPGA to the CPU) within the required timings. All these timings are linked to the base clock. A synchronization between the CPU and the FPGA is crucial. For all other CPUs I’ve worked on for the UCA, the FPGA provides the base clock to the CPU. Both the FPGA and the CPU are sharing the same clock and synchronization is easy. But 386s require a clock-doubled input (80 MHz for a 386DX-40 MHz) that I’m not able to provide directly from the FPGA because the 3.3-to-5 volt translators are too slow. So I use an external clock-doubler PLL, but doing so prevents the FPGA from having access to the CPU clock. That’s the root of all issues I had.
Fortunately, using an external phase-locked loop (PLL) means the clock input phase is synchronized with the clock-doubled output signal: the rising edge of both clocks occurs at the same time. Knowing the transmission delays added by the voltage converters at a given frequency, you can still synchronize your FPGA with the CPU without having access to the base clock. That works fine as long as you don’t change the frequency. But that was too easy: I want to be able to switch frequency on-the-fly and within a large range (from 12.5 MHz to 40 MHz to cover all 386s). That’s still possible if you build many bitfiles (compiled HDL “FPGA firmware”), one for each frequency. Nah! I want to use the same bitfile for everything, including support for both microarchitectures (Cyrix & Intel) despite the different timing’s requirements. That’s hell but I almost succeeded.
The actual firmware is not perfect but I’m quite happy with it because it works as expected in most cases. The remaining issue is a hole between ~21 and ~28 MHz where the FPGA can’t reliably catch the required inputs from the CPU at the rising or falling edge of the clock. My Logic Analyzer is unfortunately too slow to solve this but it’s not a big deal. The HDL code works fine at 12.5 MHz, 16 MHz, 20 MHz, 33 MHz and 40 MHz. The only “retail” frequency I’m not able to do is 25 MHz. I built another bitfile for this frequency only and I’ll hope to find a way to merge everything in the same bitfile later. To avoid losing my mind, I’ll wait to have enough money to buy a faster logic analyzer (like the lovely DSLogic U3Pro32) to work on this again.
— End —
But here it is: the UCA supports all Cyrix-based 386 like the 486DLC. Here are the ones I used for the test:
Unlike 386-class CPUs from AMD, which are based on Intel’s microcode and are exact clones, the Cx486DLC introduced in June 1992 uses a custom microarchitecture built from scratch by Cyrix. While still using the 32-bit 386 bus, they come with 486-class features like an embedded L1 cache and some new instructions. The Cyrix 486DLC is not a perfect pin-to-pin replacement for Intel 386s as timings are a bit different and cache control lines must be handled by the chipset. Compatibility issues are well known with many – especially older – motherboards. The original 486DLC was available at 25 MHz, 33 MHz and 40 MHz. All of these were manufactured by Texas Instruments on the 0.8µm CHMOS node. Ti also launched their own, rebranded 486DLC chips, which were exactly the same except for the marking. Please notice the vicious 90° rotation between printings and pin 1 on the Ti486DLC. Fortunately, the Universal Chip Analyzer have strong short-circuit protection built-in…
Cyrix also later released a special, clock-doubled version called the 486DRx². It was available at 16/32, 20/40 MHz, 25/50 MHz and even 33/66 MHz. This later one was the fastest PGA132 CPU ever released.
The original Cyrix 486DLC exists with two steppings: the earliest one with CPUID 0x420 and a later one with CPUID 0x421. The proprietary “DIR” identification registers available on Cyrix’s CPU is only available on newer CPUs. None of the 486DLC tested have them. The 486DRx² is the only one to have DIR registers and reports itself as Model = 0x07. The UCA happily tested the 486DLC at 40 MHz and was even able to overclock my 486DRx2 25/50 at 33/66 MHz for a short time. Cyrix 486s run hot and deserve a proper heatsink. Power consumption is as high as a 486 DX2 and can go as high as 4 watts (4 times higher than a later Intel 386 DX-33)!
Much later in the development process, I feel confident enough to try a blank 486DLC Engineering sample I got many years ago.
This ES is not a clock-doubled CPU like the DRx² and was able to run properly at 33 MHz. CPUID is 0x421 and – surprise! – it has DIR registers, identifying itself at Model = 0x01 (the expected value for a Cyrix 486DLC) and stepping 0x22, with seems to match the handwritten value (2/2) marked on top. The DRx2 25/50 tested above comes with stepping 0x21, so this ES seems newer. I don’t know at this point if any 486DLCs were released commercially with this stepping – or even if any retail 486DLCs have DIR registers enabled.
Let’s now talk about the Ti 486SXL. After having simply renamed the Cyrix 486DLC to Ti 486DLC, Texas Instruments released a new, reworked core they called the “486SXL”. It was available with PGA132 (386) and PGA168 (486) pinouts. Two models were released for PGA132 Socket: the TI 486 SXL40 and the TI 486 SXL2-50. Here they are:
They come with two major differences compared to the Cyrix 486DLC. First, TI boosted the L1 cache from 1 KB to 8 KB (same size as the Intel 486). Then, the clock-doubling feature (also available on the SXL-40 despite its name) is not always activated by default like on the DRx². It must be enabled after boot by software. You basically have to mess with internal proprietary registers to enable the clock doubling mode.
Very few 386 motherboards support the Ti 486SXL but the UCA happily tested it with and without clock-doubling. Just for fun, I ran some benchmarks on all 386s now supported by the Universal Chip Analyzer. The code is not really well-tuned and is only based on some register manipulations and a lot of math integer operations (add, sub, imult and idiv). Here are the results:
Intel 386s appear as the slowest of them all. AMD 386s performances are exactly the same as expected but their famous 40 MHz model offers a 20% boost versus the Intel 386 DX-33. Cyrix 486DLC are much faster. When introduced, they claimed “up to 2x faster than 386DX at same clock frequency”. Our test showed a ~50% improvement between the Intel 386DX-33 and the Cyrix 486DLC-33. The 486DLC-40 is ~80% faster than the fastest Intel 386.
Anyway, the most impressive performance come from the DRx²: the rare 33/66 MHz version is actually ~7x faster than the original Intel 386 DX released at 12.5 MHz in 1986! Results from the TI486SXL show it’s entirely based on the Cyrix 486DLC core with no tuning at all on the microarchitecture. The effect of the increased 8 KB cache is invisible because the UCA has an extremely fast RAM without any wait-states (similar to the L1 cache). Anyway, even real-world applications don’t benefit from a big gain (no more than 3-5% at best).
The elusive Intel RapidCAD Engineering CoProcessor is a weird and rare 2-chip set designed to upgrade 386 computers. It has been released in February 1992 for $499 and sold as a coprocessor. Technically, the RapidCAD is a 486DX assembled inside a 132-pin ceramic package that plugs into a standard 386 Socket. It features an integrated FPU but Intel removed the 8KB L1 cache and the 486-specific instructions. A second chip (RapidCAD-2) plugs into the 387 Socket, is only needed to provide the #FERR signal used to handle FPU exceptions.
This early sample has been assembled in April 1992 with dies from December 1991. The RapidCAD is able to work at any frequencies from 16 to 33 MHz. The lack of L1 cache and the slower 386 bus used does not provide a significant boost in Integer performances, but the FPU is the fastest available for 386s. The Universal Chip Analyzer is now able to fully test RapidCAD up to 33 MHz.
For some reasons, my sample was unable to run at 12.5 MHz, but works fine from 16 to 33 MHz. It’s probably due to the modification on the internal PLL needed to adapt a 486 CPU (1x clock signal expected) to a 386 Socket (2x clock required). PLLs often have limited top/bottom frequency lock range.
The reported CPUID is 0x340 and the power consumption is quite high (~2W typical in INT, ~2.5W in FPU) for a 386. I ran some INT benchmark only at 33 MHz and I got a score of 105.7 while a standard Intel 386DX-33 (or Am386DX-33) got 99.6. That’s only a 6% increase. The RapidCAD is much faster on FPU, being up to 70% faster than an Intel 387.
As like all previous microprocessors, Intel licensed the i386 design to third parties. AMD was the only one legally allowed to sell Intel-based 386s to customers (as bare CPUs), but IBM was granted the right to produce some Intel 386s for its own use. They don’t look like a standard ceramic CPUs : IBM used a plastic substrate and a metal cover to protect the die and help with thermal dissipation. Here is how they look like.
If the packaging is different, the internal die is the same as on Intel 386s. 7 different IBM part-numbers are actually known: 23F7189 (?? MHz), 32G6633 (25 MHz), 51F0352 (20 MHz), 51F1783 (20? MHz), 51F1784 (20 MHz), 51F1797 (25 MHz) and 63F7615 (25 MHz).I was able to put my hand on a 51F1784 and the later 63F7615. I tested both on the Universal Chip Analyzer. There is no “Pin 1” mark so I had to guess where is pin 1. Fortunately, the UCA has strong over-current and short-circuit protection. Let’s start with the 63F7615 :
This one is able to work fine up to 33 MHz, with a CPUID set at 0x305, similar to Intel 386 based on the D0-stepping. I don’t know for sure the real rated frequency, but it’s probably only 25 MHz. The other one (51F1784ESD) is not able to work at 33 MHz and not even at 25 MHz. The actual (early) UCA firmware only has 16/25/33/40 MHz, so I can only confirm that that my 51F1784ESD works at 16 MHz but not at 25 MHz. According to various sources, it’s probably rated at 20 MHz.
AMD also produced a lot of Am386DX at 20, 25, 33 and 40 MHz. While the microcode is 100% from Intel, the manufacturing process is different and they had lower power consumption (thanks to the 0.8µm process used by AMD instead of Intel’s 1µm CMOS-IV on the latest i386s).
Let’s start with the standard Am386 DX/DXL. I tested one Am386 DX/DXL-25 “B-Step”, one Am386 DX/DXL-33 “D-Step” and another Am386 DX/DXL-40 “C-Step”. All came in the 23936 package from Kyocera.
The UCA tool is not yet able to detect them as AMD, but I’m working on a new algorithm based on power consumption to distinguish them from Intel 386s. The B-Step identifies itself as 0x305, the same CPUID used on Intel’s 386 D0-Step. The DXL-25 was able to work up to 33 MHz. Both C- and D-Step have a CPUID set at 0x308, like the later Intel 386s (D1 step and up).
The last CPU to try was the Am386DE-33, an uncommon embedded model. Like the Am386DXL, it uses a fully-static design, meaning it can be clocked down to DC (0 Hz) while retaining all its internal registers content. The biggest difference between the usual Am386DX/DXL and the Am386DE is the disabled Paging Unit in protected mode on the latter. Bit 31 of CR0 (used to enable paging) is reserved on Am386DE. Another difference only available on the Am386DE is the ability to work at its rated frequency with a much lower voltage (down to 3.0V). And it works fine:
At 3.3V, the power needed drops by a huge margin, from 1.1 Watt to as low as 461 mW (0.46W). That’s a -60% power reduction!
Another milestone has been reached. A new iconic CPU family is now supported by the UCA: the Intel 80386, the very first 32-bit x86 microprocessor! The i386 was originally released in 1985 at 12.5 MHz and 16 MHz. It added a 3-stage instruction pipeline and an on-chip MMU (Memory Management Unit) able to address up to 4 GB of RAM. A giant capacity for that time. The original Intel 386 – then renamed 386DX – comes in a PGA132 package and its frequency was later upgraded to 20 MHz, 25 MHz and finally 33 MHz. Later clones from AMD & Cyrix – not yet supported by the UCA – were also released at frequencies up to 40 MHz.
The design of the UCA 386 Adapter was challenging because of the high frequencies involved. While all 486s work with a standard clock input, 386s require a clock-doubled signal with a strong voltage swing (CMOS) between 0 and +5V. Generating frequencies up to 80 MHz (for a 386DX-40) with these requirement was not possible with the UCA architecture, so the UCA 386 Adapter includes an external clock-doubling PLL. For the first try, I used a NB3N511 clock multiplier from On Semiconductor but I was unable to reliably run 386s at more than 20 MHz. The UCA is based on a FPGA and timings are crucial. With the NB3N511, I was unable to successfully match timings because of the lack of phase synchronization between the input clock generated by the FPGA and the doubled frequency fed to the CPU. So I needed another PLL, with 0-delay between input and output.
After some research, I gave the ICS570A a try and it worked perfectly fine. I was able to sync the internal FPGA logic to the clock-doubled CPU signal up to 40/80 MHz, without having direct access to that signal. Some high-quality ceramic decoupling caps were also mandatory to achieve the highest frequencies. I had a bad surprise while looking at ZIF socket for the 386 Adapter: for some reasons, PGA132 ZIF Socket are extremely expensive and I was unable to source them at a decent price. I bought some at ~$25 each, but if you can help me find more at a lower price, please drop me an email!
The 4 standard test frequencies for 386 are set to 12.5 MHz, 25 MHz, 33 MHz and 40 MHz. Right now, only Intel 386 are supported, but I’m working on AMD, Cyrix, etc. clones and I’m confident the UCA will support them all very soon. Here are the i386 I have for testing:
The first one is an early 386 clocked at 16 MHz and produced in 1986. The ΣΣ sign engraved shows that it has been tested free of the infamous 32-bit multiplier bug (more on this on a later post). According to this source, the S40344 S-Spec is a B1 stepping. This is confirmed by the CPUID displayed by the UCA Analyzer tool: 0x303. At 12.5 MHz, this CPU requires about 188 mA at 5V (a bit less than 1W).
The second one is very similar to the first one, except the rated speed at 20 MHz. Still the same B1-Stepping, same CPUID, same ΣΣ, same power consumption, same everything. It also can’t be overclocked at 25 MHz.
The third one is also rated at 20 MHz but don’t have any S-Spec. It has been assembled in November 1988, more than one year after the previous one. The CPUID is different at 0x305, which indicates a D0 stepping. Surprisingly, the power consumption is 10% higher, at ~212 mA for 5V at 12.5 MHz. Maybe Intel added some logic to solve the numerous erratas in previous stepping, maybe it’s just sample variation. Anyway, the D0 stepping still uses the CHMOS III (1.5 µm) process. This chip can be overclocked at 25 MHz
The fourth one is a 16 MHz marked SX236 and build with the more advanced CHMOS IV process. The CPUID is 0x308, which is used by Intel for D1, D2 and E Stepping. The last stepping is usually marked on the chip, so we can guess it’s a D1 or D2 stepping. Power consumption drops from 212 mA to 147 mA, thanks to the 1 µm process (instead of 1.5 µm). Unfortunately, it can’t work at 25 MHz.
The fifth one is a A80386DX-33 made in 1992 with s-spec SX366. It’s the faster clock speed Intel released for a 386DX. The CPUID is still set at 0x308 but the stepping is clearly more advanced: the power consumption drops to 126 mA while using the same CHMOS IV process than the previous one. This particular CPU requires 126 mA at 12.5 MHz, 206 mA at 25 MHz and up to 261 mA at 33 MHz. It can’t be overclocked to 40 MHz.
The last one is much newer than the others. It was manufactured in 2000 and uses the E-Stepping, as stated by the last letter of the lot code. Aside from this, all specs look identical to the SX366. Measures are also the same and it doesn’t work at 40 MHz.
Last but not least, UMC’s 486s can now be tested on the Universal Chip Analyzer. All 486s ever manufactured are now supported! UMC is a Taiwanese semiconductor company still active today, albeit much smaller than its well-known competitors (TSMC, GlobalFoundries, …). In the mid-90s, UMC produced some rare 486s compatible CPU named “UMC Green CPU”. They were in-house design and not Intel-licensed like AMD 486s. Almost immediately after the initial announcement in 1993, Intel sued UMC and its distributors over patent infringements (including the infamous ‘338 patent, read more here). In response, UMC filled an anti-trust suit against Intel, but finally choose to give up and cease production of x86 compatible CPUs. Here are the retail UMC 486s I have:
The most common one was the U5SX (sometimes marked U5S) clocked at 25, 33 or 40 MHz and without FPU. Some UMC Green CPUs labeled “U5SD” were also released. Contrary to what you might think, they don’t include a FPU – verified with the UCA – but are just supposed to use the “DX pinout”. The meaning of this mention is unclear because both the i486SX and the i486DX shares the same layout for non-FPU related pins. Some guys have claimed that the difference is a relocation of the NMI pin, but I have not noticed that. UMC also released extremely rare U5D (with FPU) and U486DX2 (clock doubled with FPU), but only a couple samples are known today worldwide.
The Universal Chip Analyzer is able to test all UMC 486s, even if I still have an issue at 33 MHz and 40 MHz to grab every I/O. Code fetch works fine, but the output on I/O ports are sometimes dropped. It’s probably easy to solve, but I really need a copy of the UMC 486 manual / datasheet. Unfortunately, it seems nobody has one in the CPU collector’s community. If you can help, you’re more than welcome!
Looking at threads about the UMC 486 on cpu-world, vogons and vcfed, I saw that some people were wondering if the CPU marked “SUPER” were different from the “non-SUPER” ones. So, I ran some INT benchmark with the UCA.
As you can see, all UMC CPUs (U5S-SUPER, U5SX and U5SD) offer the exact same results when clocked at the same frequency. You also probably noticed the awesome relative performance of the UMC 486 versus the Intel 486s. Clocked at 33 MHz, the UMC 486 is as fast as an Intel 486DX2-66 and, when clocked at 40 MHz, it’s almost as fast as an Intel DX4-75. That’s not a bug. The UCA doesn’t have any wait-state on memory subsystem and actually uses a mix of heavy Integer divide and multiplication instructions for the benchmark. This result comes from the ultra-fast ALU designed by UMC, especially on divides. While Intel 486s requires 40 cycles to perform a INT divide, UMC 486s only need 7 cycles. That’s more than 5 times faster! However, in real-world applications, the UMC 486-SUPER at 40 MHz was on par with an Intel i486SX2-66. Still excellent!