ATX2AT Smart Converter – Firmware 1.21 released

I’ve just released a new firmware (1.21) for the ATX2AT Smart Converter and an update (0.4b) for the Windows companion tool (ATX2AT Configuration tool). Both are available as source and binary on the GitHub page.

Here is the change log :

    • Added a configuration option for AT-Style push button
    • Added a “firmware outdated” version check at startup
    • Added a firmware update feature within the Configuration tool for easy update
    • Solved an issue with Infinite (disabled) screensaver setting
    • Solved an issue with log display

Basically, you just need to download the ATX2AT Configuration tool v0.4b binary package and use the “FW Update” button located on bottom-right corner. The tool should be able to auto-detect the ATX2AT Smart Converter, switch it to bootloader mode then use the embedded avrdude to flash the new firmware. If all goes well, you will see your new Firmware Revision as 1.21 :

You will notice a new option called “Power Button Type” that defaults to the standard ATX-style (momentary push button). Some users asked for a way to use the ATX2AT Smart Converter with a genuine AT case using the standard switch (SPST). So here it is. With the Power Button Type set to “AT”, it’s now possible to wire a standard AT button on the 2-pin EXT_PWR connector (2.54 mm / 0.1″ header).

Universal Chip Analyzer v2 disclosed!

With the development of the PGA Shields (now able to support all Intel CPUs from 80186 to 80486) and the rise of demand from collectors, it was time to think about producing a batch of the Universal Chip Analyzer. In January, I finally decided to rebuild everything from scratch to get rid of old issues and restart from a “clean” foundation. The original Mojo v3 board I used since the very beginning was a fantastic tool, but after way too many patches, I encountered “hard” limitations which would have become major issues later. As I don’t want to rework the base FPGA board nor the main interface (IF) board for years to come, the solution was to build the perfect PCBs one time for all.

So, let me introduce the Universal Chip Analyzer v2!

UCA FPGA Base Board

The Mojo V3 was a great tool, but it’s a 2013 Kickstarter product tailored as a development board.  I hesitated for a long time to replace the Xilinx Spartran-6 FPGA with a “new-gen” Spartran-7 or even an Artix-7. I finally decided to stay with the Spartran-6 for many reasons.

    1. Xilinx 7-Series FPGA are only available in BGA and not in QFP packaging. That mean more complex PCB and higher manufacturing cost.
    2. While 6-series are happy with two simple 3.3V and 1.2V linear regulators, 7-series FPGAs requires 3.3V/1.8V/1.35V and 1.0V. That mean noisy DC-DC buck converters, more filtering, and ultimately MUCH higher BOM and assembly costs.
    3. The speed and logic cells count on the Spartran-6 XC6SLX9-2 are enough for all actual and future uses I can think of. I could have used more Block RAM, but it’s not a limitation.
    4. Xilinx announced that that this FPGA is a “long term product” that will be manufactured at least until 2027. It’s also quite cheap now (< $10).

Switching to a Spartran-7 or Artix-7 would have just significantly increased the price and overall complexity without adding any feature. The only interesting point I will miss is related to the development toolchain. I could have finally got rid of the infamous Xilinx ISE for the new Xilinx Vivado Design Suite. But after all, I’m now quite comfortable with all the damn ISE’s bugs, so…

Here is the new Universal Chip Analyzer board next to the old one.

I kept the overall form factor, just a bit (6 mm) higher, but many components changed.

    1. ARM Main microcontroller – The original 8-bit ATMEGA32U4 (at 16 MHz, with 32 KB Flash & 2.5 KB SRAM) has been replaced with a 32-bit ARM-based ATSAMD21G18. The new MCU is clocked at 48 MHz, Flash capacity is 8x higher (256 KB) and SRAM is now upgraded to 32 KB. It’s also MUCH faster and I have room for many future improvements. While the ATMEGA32U4 was 80% full, the new ATSAMD21G18 is under 20% after a full code rewrite, and with more features added!
    2. 512 Mb Flash Memory – The original Mojo v3 used a 4 Mb SPI Flash able to store a single FPGA configuration file. With the first UCA, I upgraded the flash to 128 Mb to store up to 40 different configuration bit-files. The final UCA now use a 512 Mb Flash to store more than 150 configurations file simultaneously.
    3. EEPROM – A small 64 Kb I2C EEPROM to store calibration constants, configuration, serial numbers, etc. has been added.
    4. USB-C Connector – The good old Micro-USB connector tend to become obsolete. The new reversible USB-C connector will soon become the standard. It is also more robust.
    5. Better XO. The main 50 MHz oscillator has been upgraded to a 20 ppm, low power one for lower jitter and better stability at high frequency.
    6. Stronger power filtering – The filtering/decoupling stage was limited on previous board. It is now much stronger, allowing higher noise immunity and better switching speed for fast CPU like 486s. Thermal dissipation has also been vastly improved.
    7. Power Connector – First prototype of the old UCA v2 used a tiny 1.35mm jack located on the IF board. The final one come with a standard 2.1mm jack with polarity reverse protection. An additional 9V or 12V power supply is mandatory for all supported Ics. I tested some USB to 9V/12V adapter, and they work fine, making testing from a power bank on the field possible.

There are also many layout changes, allowing for example I2C communication from the MCU to IF to Adapter boards.

UCA Interface (IF) Board

The final IF board has been upgraded to perfectly fit on top of the FPGA board. The PCB has been enhanced for reliability while lowering BOM cost. All but one tantalum capacitors have been replaced by MLCC (ceramic) caps. Layout has also been improved for better decoupling efficiency. Along the main voltage transceivers, the UCA IF board includes a 2A DC-DC voltage converter, precision voltage and current monitoring, and adjustable fast overcurrent protection. Voltage can be set by software (25 mV steps). A standard 3-pin fan header is available for high-power CPU like DX4s.

The slightly bigger PCB height allowed an optional 0.91″ 128×64 OLED display to fit on top of the board. It will be used later to display additional information about the test status. Right now, it shows the selected CPU Family and the voltage/current used.

UCA Adapters

The pinout on both 50-pin connectors located on the IF Board as slightly changed to accommodate previous modifications. I added some new signals to avoid future limitations. For example, the I2C is not passed from the ARM MCU to the adapter boards. Adapter’s ID also changed for their final values, so all currently designed adapters required a small layout change.

Let’s see the currently designed adapter and their current status.

    • UCA 80486 Adapter

The 486 adapter has been recently upgraded to support JTAG reading. From a hardware point of view, the adapter is almost finished. There is still a small side feature I would like to add, but it’s a minor modification. The 486 adapter is able to test all 486 ever released, from the Intel 486 SX-16 to the Cyrix 5×86-P133, but also 487s, AMD 586, Ti, UMC and IBM 486s.

    • UCA 80386 Adapter

The 386 Adapter has been the most difficult one to build so far. While the hardware is now almost fine, it still need some work on the FPGA code to fine-tune some timings.

    • UCA 80286 Adapter

Almost finished and working as expected with all kind of 286s. The internal MCU code must be rewritten to accomate the new communication protocol, but it’s not a very complex task.

    • UCA 80186 Adapter

The 186 Adapter was the first adapter to be build directly for the new UCA “v2”. It was used to debug the new communication protocol between the different part of the UCA. Both Hardware and Software are now done. The only missing feature is the automatic detection of 186 vs 188 (currently, you have to select the correct bus type with the DIP Switch)

    • UCA DIP40 Adapter (8088/8086 & more) 

The “iAPX-86 Adapter” has been renamed the “DIP40” Adapter as it is able to also test various other DIP40 IC. Along 8086 & 8088, the UCA DIP40 Adapter can also test 8085s, NSC800s, MCS48 and MCS51 MCUs, RCA “COSMAC” CDP1802s without the need of any adapters. With a specific adapter that plug on top of the DIP40 ZIF, it can also test Zilog Z80s, 8080s, MOS 65xx and Motorola 68xx.

    • UCA 8087 Adapter

The 8087 Adapter has been quickly developed to show the UCA’s ability to also test FPUs. It requires a fixed 8086-compatible CPU that runs in MAX mode (while the DIP40 Adapter uses the MIN Mode).

    • UCA 8080 Adapter

After discussion with fellow CPU collectors, I developed a standalone adapter for Intel 8080s. The price and feature of this one are the same than the Adapter that fit on top of the DIP40 Adapter :


At this time, I’m sure witch solution is the best. Maybe the standalone version is better to avoid mistake with DIP Switches… Leave a comment to give your thoughts!

    • UCA Debug Adapters

These Adapters are just for internal use, but I wanted to share some pictures just for fun.

The left one is fitted with many precision power resistors and is needed to calibrate the power monitoring IC at various current load (10 mA, 50 mA, 100 mA, 250 mA and 2×500 mA). The right one is mainly used to test all signals of a newly-assembled FPGA/IF boards. It can detect shorts to VCC, open-circuits or adjacent-signal shorts. A backplate “Firmware Programmer” board with tiny pogo-pins has also been developed to flash the initial bootloader inside a blank UCA.

Stay tuned for more news about IC support and UCA production soon!

 

The UCA now supports Intel 487 SX

Released in 1991 and marketed as a floating-point coprocessor for the Intel 486 SX, the Intel 80487 was actually a fully featured Intel 486 DX with a slightly different pinout. Intel added an unconnected 169th pin as a mechanical key for the 487 Socket. Another pin known as “MP#” (Math Present) was used to entirely disable the original 486 SX by triggering its “back-off” (from bus) mode. Being almost 100% compatible with the 486 DX, supporting the 486 SX with the Universal Chip Analyzer was trivial. I bought many Socket 168 socket and I just drilled a 1 mm hole and it worked immediately.

 

 

According to the 487’s datasheet, it was rated at 25 MHz maximum, but it also run fine at 33 MHz. It is possible to detect a B0-step Intel 487 by its unique CPUID (0x421). AFAIK, all retail 487s are B0-Step. A0-step are Engineering Sample only (with an unknown CPUID, maybe 0x420).

While testing the 487, I noticed a strange behavior that will deserve more investigation later.  It seems the 487SX needs a longer reset period to initialize properly compared to a standard 486. Technically, it makes sense: this additional delay might be needed to let the original 486SX disable itself and back off properly from the bus (before the 487SX takes full control).

 

The UCA now supports 8087 FPUs

Early in the development process, “UCA” meant “Universal CPU Analyzer”. Then I thought it could also be used to test non-CPU like FPUs, Bitslicers or RAM chips and I finally changed the name for “Universal Chip Analyzer”. The 8087 FPU is the first supported IC that’s not a CPU or MCU. Released in 1980, it’s a much more complex chip than its companion 8086. While the later is built with 29.000 transistors, the 8087 integrates 50% more of them for a total of roughly 45.000! It handles various floating-point arithmetic operations (additions, multiplication, square root, etc.) as well as transcendental functions from exponential to trigonometric calculations. The 8087 was the very first FPU to implement the draft of what was to become the initial IEEE 754 standard (circa 1985).

Building an adapter for the 8087 starting with the iAPX-86 code already done was quite easy. Emulating the CPU with the FPGA was technically feasible, but this would have limited the complexity of the x86/87 ASM code able to run. Fortunately, 8086s are still widely available for cheap and every collectors have spares.

The UCA 8087 FPU Adapter requires any 8086 with a rated speed of 10 MHz of more (the fastest 8087 is clocked at 10 MHz). While the standard 8086/8088 UCA Shield configures the CPU in the simplified “MIN” mode, this adapter requires the “MAX” mode with additional bus decoding stages. The original Intel 8288 Bus Controller had been translated in Verilog HDL and implemented in the FPGA. After some tuning, everything was running properly :

An option to automatically subtract the power consumption of the 8086 (to show only the one from the 8087) will be added later . Target frequencies are 4, 6, 8 and 10 MHz.

 

The UCA now supports Intel 80186 & 80188

The Intel 80186 is one of the lesser known early x86 CPUs. In February 1982, 4 years after the 8086’s introduction, Intel released its successor, the 80286 (or “286”). Simultaneously, Intel also quietly released the 80186 to target different markets. While the 286 is a generic microprocessor like the 8086 was, but based on a new microarchitecture, the 186 could be considered as the first x86-based microcontroller. The difference between a microprocessor (CPU) and a microcontroller (MCU) is the level of integration inside the chip. A microprocessor requires a lot of support components (memory controller, bus arbitration logic, etc.) and is primarily used to build computers. A microcontroller integrates many of these components along with a (less powerful) microprocessor and is used for embedded purposes.

The 80186 integrates an enhanced 8086 CPU with a 16-bit bus and many support components: a clock generator, various controllers (DMA, Interrupt, bus, etc.), programmable timers, wait-state generator, chip-select logic, and even more. All these features greatly reduce the overall component count and the complexity of the board. Here is the original 186’s block diagram:

The 80186 is basically a hybrid concept that has been used in embedded applications as a microcontroller, but also as a CPU to build cheap 8086-class computers. For example, it was at the heart of the Tandy 2000 PC released in 1983, but also buried inside the Intel 14.4EX Modem to compute complex algorithms. They later used the 80188, an even cheaper offshoot almost identical to the 80186 but based on an external 8-bit bus (like the 8088). As 8086-class CPUs, both the 80186 and the 80188 can be linked to the 8087 FPU, but this association was almost never found in real-world products. Original 80188/80186s were built on Intel’s HMOS 3 µm process at 6 MHz, 8 MHz and 10 MHz. They came in 3 different packages: PGA-68, leadless ceramic (CLCC-68) and leadless plastic (PLCC-68). The Universal Chip Analyzer is now able to test and run code on all these CPUs:

UCA testing an original A80188 (PGA-68) at 8 MHz

In 1987, Intel released the 80C188 and 80C186, built on Intel’s 1.5 µm CMOS process. Clock speeds reached 16 MHz and power consumption was vastly reduced. Some features were also added: a power-save mode, a refresh controller to handle RAM refresh cycle without external components and a FPU interface to support the newly released 80C187 (support for the old 8087 was dropped). The uncommon 80C187 is essentially a 80387 repackaged into a DIP-40 or PLCC-44 package. The UCA is able to test and detect 80C186 and 80C188 in various packages:

UCA testing a Intel A80C186-16 (PGA-68) at 16 MHz

In 1991 (the 486 was available at that date), Intel released the improved “XL” variant. Thanks to the CMOS 1 µm process, the 80C186XL and 80C188XL were able to reach up to 25 MHz at a lower power consumption. They now use a static design (able to be clocked down to DC for even more power reduction) while the 80C18x were based on a dynamic design (with a minimum clock frequency needed to retain internal register values). The UCA can also test all members of the “XL” family and even detect their stepping (A-/B- or C-step) :

UCA testing a R80C188XL-25 (CLCC-68) at 20 MHz

The maximum frequency for the UCA is 20 MHz because 186/188 requires a clock doubled input and I wanted to avoid an external PLL to keep cost low (the 186 adapter is a simple 2-layer PCB). Adding  support for it to reach 25 MHz (or much more) is trivial but that will almost double the BOM price for the adapter (from ~$10 to ~$20).

Intel also released the 80C186EB (5V) and 80L186EV (3V) in 1990 and the 80C186/188EA & 80C186/188EC one year later (also available in ‘L’ version).  The 80C186EA in PLCC-68 package is very close to the 80C186XL. The main differences are some more advanced power saving modes and TTL-level inputs compatibility (while the XL requires CMOS-level inputs). I’m still looking for one, but they should work fine on the UCA. The “EB” line adds an improved chip-select unit, two UART for serial communication and 16 GPIO. While electrically able to run on the UCA, they come in a bigger PLCC-84 and PGA-88 packages and don’t fit physically. The “EC” line adds even more GPIOs and is only available in SMD QFP-100 packaging. Designing an adapter for EB and EC 186/188s is not planned at this time.

Stay tuned for another big UCA milestone in the next few days!

PS: PLCCs 80188/186 are also supported!

JTAG Support Added to the UCA 486 Adapter

While developing the 486 Adapter for the Universal Chip Analyzer, I was worrying about how to distinguish between early CPUs from AMD and Intel (the ones without CPUID instruction support). There is no way to distinguish them because they’re basically the exact same chip: same microcode, same architecture, same power consumption, etc. AMD used the Intel’s die for its whole early 486 line and only the external packaging was different. Thus, no BIOS nor any software detection tool can distinguish between an early AMD Am486DX2-66NV8T and an Intel 486 DX2-66. Both even share the same ID set in EDX register at boot.

I carefully read the datasheets and finally found a small difference between them. It’s located in the JTAG controller, embedded in all AMD 486s and Intel 486s starting with the DX-50. The JTAG controller is used as an internal test tool since the late 80s, standardized in 1990 as IEEE 1149.1 (“Standard Test Access Port and Boundary-Scan Architecture“). It’s now an industry-standard feature present in all complex ICs for debugging purposes. JTAG was commonly used in the 90s to remotely sense the state of all hardware pins with the ability to toggle them individually between 0, 1 and High-Z (floating).

The JTAG controller is generally totally isolated from the CPU: you can’t access any of the internal test features nor test registers from the code running on the CPU. (Some years ago, Intel added a feature to access JTAG from USB, which caused some serious vulnerabilities). Back in the 90s, JTAG access had to be done from dedicated CPU pins called the TAP (Test Access Port). The TAP uses 3 input pins (TMS for Chip Select, TCK for Clock and TDI for Data Input) and one output pin (TDO for Data Output). JTAG has been designed to daisy chain many ICs (boundary-scan).

The basic early JTAG implementation in 486s supports 5 instructions:

    • (0000b) EXTEST – Arbitrary setting of pins on the CPU to a given state (0, 1, Z)
    • (0001b) SAMPLE – Poll and report the status of all CPU pins.
    • (0010b) IDCODE – Used for chip identification
    • (1000b) RUNBIST – Launch the internal self-test, built-in on all CPUs since the 386s
    • (1111b) BYPASS – Connect TDI with TDO to bypass the chip (when talking with another IC in the chain)

According to Intel’s datasheet, the IDCODE instruction reports a 32-bit register with the following format:

The Manufacturer Identity is a 11-bit value linked to the chip manufacturer:  0x09 for Intel and 0x01 for AMD. That’s how you can distinguish between an Intel and AMD 486. JTAG is not available on Cyrix, TI and UMC 486s, but these CPUs don’t use the Intel Microcode and they have other identification methods. Accessing the IDCODE register to distinguish AMD and Intel 486s requires specific hardware. Due to limitations in I/O lines available from the FPGA and the tiny space available on the PCB, I chose to add an extremely tiny ATMega328P-MN (0.5 mm pitch!) on the 486 Adapter to access the JTAG port:

The code for bit-banging JTAG commands and communicating with the JTAG controller was quickly written, thanks to this blog that published a nice proof-of-concept many years ago. I then added the link between the FPGA and the outside world to grab the JTAG data from the Windows companion tool. I took the opportunity to rewrite almost all the communication stack between the Universal Chip Analyzer, its integrated MCU and the FPGA.  Let’s try with some real-world 486s!

* AMD Am486DX2-66NV8T

The JTAG IDCODE register reported (0x00432003) strictly follows Intel’s datasheet:

      • Bit[0] = 1 (JTAG constant)
      • Bit[11:1] = 0x01 (AMD’s Manufacturer ID)
      • Bit[27:12] = 0x0432 (Part Number = CPUID Family/Model/Revision)
      • Bit[31:28] = 0 (Revision not set)

As expected, the Part Number filed by AMD is the same as the value reported in the DX register just after boot. All Am486s I tested follow this scheme. I noted that the value reported on the JTAG IDCODE register changes with features activated (2x or 3x multiplier, WT or WB cache mode) just like the CPUID value.

 * Intel 486DX2-66

Here is the most interesting part. For some reason, Intel does not follow its own public datasheet on most of its CPUs. Many Intel’s 486-era datasheets show the JTAG bit order as previously described, but the real value returned by many CPUs I tested often reports a totally different organization (only described properly on a printed Intel Datasheet I own).

 

The raw JTAG IDCODE register value reported on an early i486 DX2-66 (SX626) is 0x00432013 as expected, but a late one (SX955) returns another encoding: 0x40285013. It decodes as follows:

      • Bit[0] = 1 (JTAG constant)
      • Bit[11:1] = 0x09 (Intel’s Manufacturer ID)
      • Bit[16:12] = 0x05 (Proprietary Model Code)
      • Bit[20:17] = 0x04 (CPUID Family, 0x04 = 486)
      • Bit[26:21] = 0x01 (Intel Architecture Type, 1 = x86)
      • Bit[27] : 0x00 (Core Voltage – 1 = 3.3V / 0 = 5V)
      • Bit[31:28] = 0x04 (Proprietary Revision Code)

The Model Code reported in the 5-bit field in bits 16:12 is different than the 4-bit “Model” code read in DX at reset. Here is what I noted:

      • 0x01 = 486 DX
      • 0x02 = 486 SX
      • 0x05 = 486 SX2 or DX2
      • 0x07 = 486 DX2 w/ WT Cache
      • 0x08 = 486 DX4

Support for JTAG is definitely an interesting feature to dive deeper in the 486 architecture. As for today and as far I know, the Universal Chip Analyzer is the only hardware or software tool to distinguish between an Am486 and an Intel 486.

More UCA news soon!

 

The Universal Chip Analyzer now supports Intel 286

Another milestone – albeit not the hardest one – has been reached! The 80286 is the 2nd gen 16-bit x86 CPU introduced by Intel in 1982. The most important improvement was the use of a separated data and address bus. Its predecessor, the famous Intel 8086, used the same pins to send address and then data. The lack of that slow time-multiplexed bus on the 286 allowed a major performance boost, sometimes more than 100% at a similar clock speed. The microarchitecture also evolved with a more advanced (dedicated) address calculation unit and a faster multiplier. The 80286 was also able to support up to 16 MB of RAM, thanks to its 24-bit address bus.

The 80286 also introduced the protected mode, designed to allow much more advanced memory management, with the ability to build multi-user systems using multitasking applications. Unfortunately, due to several limitations in that first implementation, along with several hardware errata found in earlier stepping, protected mode wasn’t really used by software developers on the 286. Intel only solved all these issues with the 80386. The Intel 80286 was initially released at 4, 6 and 8 MHz on nMOS 1.5 µm process. Later released reached 12.5 MHz in 1 µm CMOS process. Several other companies produced CPU fully based on Intel’s 286 microcode like AMD, Siemens and Harris, with speed up to 25 MHz!

UCA 286 Adapter testing LCC (left) & PLCC (right) 286s

Three common 68-pin packages were used for the vast majority of the 80286s ever produced: the original ceramic PGA, a leadless LCC (also ceramic) and a plastic PLCC. On the picture above, you can see an AMD R80286-8 (LCC) and a Harris CS80C286-25 (PLCC). The Universal Chip Analyzer is able to test all three packages just by plugging the related socket on the PGA DIP Socket. Frequencies available (by DIP Switches or software) are 4, 8, 10, 12.5, 16 and 20 MHz.

Why not 25 MHz? Because the 80286 requires a clock-doubled input and feeding a 50 MHz clock to get the 25 MHz core frequency would have required an external PLL. Not a big deal, but there is only one rare 286-class CPU that supports this frequency (the Harris/Intersil CS80C286-25 pictured above) and its timings is not fully compliant with the 286 specifications. Designing a special UCA adapter just for this chip is trivial, but quite useless because the 286 Adapter is already able to test it at 20 MHz. Speaking about “high” frequencies, using the right socket is crucial  

The UCA 286 Adapter is fitted with a high-quality DIP Socket. Directly plugging a PGA 286 CPU is possible but not convenient for testing multiple CPUs in a row.  I was able to secure some ZIF Sockets for 68-pin PGA like the blue one pictured here (from AMP) and also some 3M LCC sockets complete with top cap. About PLCCs, I first tried some cheap socket from eBay. That was a disaster: contact pins were too weak and bent after 2-3 insertions. Worst, the maximum frequency allowed was 8-10 MHz. Replacing these crappy sockets with other ones from Foxconn or 3M solved all the issues. I also bought some awesome Yamaichi test Socket for PLCC (on the right), but unfortunately, they use a specific pinout. As the 286 Adapter uses a simple 2-layer PCB, I will consider designing a specific PCB just for them.  

With the hardware finished, I’ll later tune the software-testing code to see if I can detect various stepping, and maybe also the manufacturer.  

 

The UCA 386 Adapter supports Ti & Cyrix 486s

Adding support for Cyrix & TI 486s was supposed to be a matter of hours. It finally took almost one month and gave me many headaches. I almost burned everything to the ground several times in rage, begged for help from FPGA’s gurus who told me what I’m trying to achieve was like squaring the circle, but I did not give up. Let’s try to explain why it was so hard.

— always(@TLDR; Technical stuff) —

FPGAs are synchronous beasts used to create finite states machines: almost everything inside a FPGA is synchronized to a clock signal. Each time the clock is ticking, the HDL code analyzes inputs and sets a pre-defined state (that itself defines registers, outputs, the next state, …). To add support for a CPU, you must read the datasheet and write some HDL code that will provide the correct outputs (from the FPGA to the CPU) within the required timings. All these timings are linked to the base clock. A synchronization between the CPU and the FPGA is crucial. For all other CPUs I’ve worked on for the UCA, the FPGA provides the base clock to the CPU. Both the FPGA and the CPU are sharing the same clock and synchronization is easy. But 386s require a clock-doubled input (80 MHz for a 386DX-40 MHz) that I’m not able to provide directly from the FPGA because the 3.3-to-5 volt translators are too slow. So I use an external clock-doubler PLL, but doing so prevents the FPGA from having access to the CPU clock. That’s the root of all issues I had.

Fortunately, using an external phase-locked loop (PLL) means the clock input phase is synchronized with the clock-doubled output signal: the rising edge of both clocks occurs at the same time.  Knowing the transmission delays added by the voltage converters at a given frequency, you can still synchronize your FPGA with the CPU without having access to the base clock. That works fine as long as you don’t change the frequency. But that was too easy: I want to be able to switch frequency on-the-fly and within a large range (from 12.5 MHz to 40 MHz to cover all 386s). That’s still possible if you build many bitfiles (compiled HDL “FPGA firmware”), one for each frequency. Nah! I want to use the same bitfile for everything, including support for both microarchitectures (Cyrix & Intel) despite the different timing’s requirements. That’s hell but I almost succeeded.

The actual firmware is not perfect but I’m quite happy with it because it works as expected in most cases. The remaining issue is a hole between ~21 and ~28 MHz where the FPGA can’t reliably catch the required inputs from the CPU at the rising or falling edge of the clock. My Logic Analyzer is unfortunately too slow to solve this but it’s not a big deal. The HDL code works fine at 12.5 MHz, 16 MHz, 20 MHz, 33 MHz and 40 MHz. The only “retail” frequency I’m not able to do is 25 MHz. I built another bitfile for this frequency only and I’ll hope to find a way to merge everything in the same bitfile later. To avoid losing my mind, I’ll wait to have enough money to buy a faster logic analyzer (like the lovely DSLogic U3Pro32) to work on this again.

— End —

But here it is: the UCA supports all Cyrix-based 386 like the 486DLC. Here are the ones I used for the test:

Cyrix 486DLC & DRx2Unlike 386-class CPUs from AMD, which are based on Intel’s microcode and are exact clones, the Cx486DLC introduced in June 1992 uses a custom microarchitecture built from scratch by Cyrix. While still using the 32-bit 386 bus, they come with 486-class features like an embedded L1 cache and some new instructions. The Cyrix 486DLC is not a perfect pin-to-pin replacement for Intel 386s as timings are a bit different and cache control lines must be handled by the chipset. Compatibility issues are well known with many – especially older – motherboards. The original 486DLC was available at 25 MHz, 33 MHz and 40 MHz. All of these were manufactured by Texas Instruments on the 0.8µm CHMOS node. Ti also launched their own, rebranded 486DLC chips, which were exactly the same except for the marking. Please notice the vicious 90° rotation between printings and pin 1 on the Ti486DLC. Fortunately, the Universal Chip Analyzer have strong short-circuit protection built-in…

Cyrix also later released a special, clock-doubled version called the 486DRx². It was available at 16/32, 20/40 MHz, 25/50 MHz and even 33/66 MHz. This later one was the fastest PGA132 CPU ever released.

Cyrix 486DLC-40 &amp; Cyrix 486DRx²-25/50 Tested on the UCA
Cyrix 486DLC-40 & Cyrix 486DRx²-25/50 Tested on the UCA

The original Cyrix 486DLC exists with two steppings: the earliest one with CPUID 0x420 and a later one with CPUID 0x421. The proprietary “DIR” identification registers available on Cyrix’s CPU is only available on newer CPUs. None of the 486DLC tested have them. The 486DRx² is the only one to have DIR registers and reports itself as Model = 0x07. The UCA happily tested the 486DLC at 40 MHz and was even able to overclock my 486DRx2 25/50 at 33/66 MHz for a short time. Cyrix 486s run hot and deserve a proper heatsink. Power consumption is as high as a 486 DX2 and can go as high as 4 watts (4 times higher than a later Intel 386 DX-33)!

Much later in the development process, I feel confident enough to try a blank 486DLC Engineering sample I got many years ago.

This ES is not a clock-doubled CPU like the DRx² and was able to run properly at 33 MHz. CPUID is 0x421 and – surprise! – it has DIR registers, identifying itself at Model = 0x01 (the expected value for a Cyrix 486DLC) and stepping 0x22, with seems to match the handwritten value (2/2) marked on top. The DRx2 25/50 tested above comes with stepping 0x21, so this ES seems newer. I don’t know at this point if any 486DLCs were released commercially with this stepping – or even if any retail 486DLCs have DIR registers enabled.

Let’s now talk about the Ti 486SXL. After having simply renamed the Cyrix 486DLC to Ti 486DLC, Texas Instruments released a new, reworked core they called the “486SXL”. It was available with PGA132 (386) and PGA168 (486) pinouts. Two models were released for PGA132 Socket: the TI 486 SXL40 and the TI 486 SXL2-50. Here they are:

They come with two major differences compared to the Cyrix 486DLC. First, TI boosted the L1 cache from 1 KB to 8 KB (same size as the Intel 486). Then, the clock-doubling feature (also available on the SXL-40 despite its name) is not always activated by default like on the DRx². It must be enabled after boot by software. You basically have to mess with internal proprietary registers to enable the clock doubling mode.

Very few 386 motherboards support the Ti 486SXL but the UCA happily tested it with and without clock-doubling. Just for fun, I ran some benchmarks on all 386s now supported by the Universal Chip Analyzer. The code is not really well-tuned and is only based on some register manipulations and a lot of math integer operations (add, sub, imult and idiv). Here are the results:

386-class CPUs benchmark

Intel 386s appear as the slowest of them all. AMD 386s performances are exactly the same as expected but their famous 40 MHz model offers a 20% boost versus the Intel 386 DX-33. Cyrix 486DLC are much faster. When introduced, they claimed “up to 2x faster than 386DX at same clock frequency”. Our test showed a ~50% improvement between the Intel 386DX-33 and the Cyrix 486DLC-33. The 486DLC-40 is ~80% faster than the fastest Intel 386.

Anyway, the most impressive performance come from the DRx²: the rare 33/66 MHz version is actually ~7x faster than the original Intel 386 DX released at 12.5 MHz in 1986! Results from the TI486SXL show it’s entirely based on the Cyrix 486DLC core with no tuning at all on the microarchitecture. The effect of the increased 8 KB cache is invisible because the UCA has an extremely fast RAM without any wait-states (similar to the L1 cache). Anyway, even real-world applications don’t benefit from a big gain (no more than 3-5% at best).

Stay tuned for more exciting news from the UCA!

 

The UCA 386 Adapter now supports Intel RapidCAD

The elusive Intel RapidCAD Engineering CoProcessor is a weird and rare 2-chip set designed to upgrade 386 computers. It has been released in February 1992 for $499 and sold as a coprocessor. Technically, the RapidCAD is a 486DX assembled inside a 132-pin ceramic package that plugs into a standard 386 Socket. It features an integrated FPU but Intel removed the 8KB L1 cache and the 486-specific instructions. A second chip (RapidCAD-2) plugs into the 387 Socket, is only needed to provide the #FERR signal used to handle FPU exceptions.

This early sample has been assembled in April 1992 with dies from December 1991. The RapidCAD is able to work at any frequencies from 16 to 33 MHz. The lack of L1 cache and the slower 386 bus used does not provide a significant boost in Integer performances, but the FPU is the fastest available for 386s. The Universal Chip Analyzer is now able to fully test RapidCAD up to 33 MHz.

For some reasons, my sample was unable to run at 12.5 MHz, but works fine from 16 to 33 MHz. It’s probably due to the modification on the internal PLL needed to adapt a 486 CPU (1x clock signal expected) to a 386 Socket (2x clock required). PLLs often have limited top/bottom frequency lock range.

The reported CPUID is 0x340 and the power consumption is quite high (~2W typical in INT, ~2.5W in FPU) for a 386. I ran some INT benchmark only at 33 MHz and I got a score of 105.7 while a standard Intel 386DX-33 (or Am386DX-33) got 99.6. That’s only a 6% increase. The RapidCAD is much faster on FPU, being up to 70% faster than an Intel 387.

The Odd Story of Factory-Downgraded 486s

Counterfeits CPU were very common in the mid-90s. The worst period was between 1993 (just after the launch of the Intel 486 DX2) and 1998 (when the Pentium II started to be multiplier-locked). It was extremely easy for tricksters to remove the original marking and reprint another one with a higher frequency rating. Many DX4-75 were remarked to DX4-100, and even more Pentium 133/150 were remarked as Pentium 166 or 200s.

Genuine factory-remarked CPUs also exist, but they’re generally uncommon. The most well-known example is the double-sigma (ΣΣ) sign added on early 386s after they had been tested bug-free from the infamous 32-bit multiplier bug. Some rare Intel 486 SX were also later remarked with a higher speed grade. Here are two of them:

As for all factory-remarks, the addition is quite obvious. Intel probably binned twice these CPUs again at the request of a big customer (IBM?) and added the second rating later. Today’s story about factory-remarks is much more unusual because it concerns standard models.

Am486DX4-100SV8B (remarked 5×86)

After I published this analysis some weeks ago, a reader told me he had a strange Am486DX4-100 that seemed to be a AMD 5×86. After a careful look at the printings that looked 100% genuine at first sight, he was kind enough to lend it to me for further investigation with the UCA. Here it is:

The “9626” date code tells us it was manufactured in late June or early July 1996, which is quite late for a Am486DX4. I immediately noticed the 25544 package code, only used for the 350 nm die. This die was the basis of all Am486DX5 and Am5x86. The “C” stepping was also unusual as the Am5x86 is based on the A-step (from November 1995) or B-Step (from March 1997). A “C” Stepping build in 1996 is incoherent with the 5×86 line, but very coherent with the 486DX4 (later 486DX4 in the latest “C” Stepping were built on the 25498 package in May/June 1996). So it was time for a test on the Universal Chip Analyzer:

 

WOW! There is no doubt: this CPU is really based on the standard 350 nm die with a fully enabled 16 KB Write-Back L1 cache and a working 4x multiplier. Actually, it can even be overclocked easily to 133 MHz. All specs, including power consumption and CPUID (0x4F4), make it indistinguishable from an AMD 5×86. This CPU can of course also work with a 3x multiplier like an AMD 486DX4-100 (CPUID drops to 0x494).

After some research, it seems that all CPUs based on the 25544/C package are marked as 486DX4-100SV8B while being really DX5 SV16B (5×86). AMD produced them for quite some time between February 1996 and March 1997. They probably stopped the production of the old 500 nm die in early ’96 but still had some demand from customers for DX4s, so they just used the new 350 nm die and marked these CPUs as DX4-100s. As long as you use the default x3 multiplier, they behave exactly like the old one … except for the cache size.

Has Intel also done such weird things? I could have sworn no way. I was wrong…

Intel 486DX2-66 SK080 (remarked DX4)

The same reader also sends me a DX2-66 that could be “really a DX4-100”. That sounded odd and really unlikely to me because Intel has a strict policy on S-Spec. Intel DX4s also have a specific CPUID to help distinguish them from DX2s by software. Unlike AMD 486s, this CPUID does NOT change with the multiplier used, so it’s strange to have a DX2 with a DX4’s CPUID. Here is the original CPU:

Everything looks genuine here. SK080 is one of the least common S-Spec for Intel DX2s. The only other S-Spec beginning with “SK” is the extremely rare SK058. The SK080 is a 3.3V SL-Enhanced part which seems to have been produced only between WW18’94 (May 1994) and WW48’94 (November 1994). Let’s plug in into the UCA:

Awesome! This is really a DX4 factory-downgraded to DX2-66. The 0x480 CPUID leaves no doubt about the original die used here. The usual power consumption and the ability to work fine at 3.3V at 100 MHz let me think it’s probably a DX4-100. With the multiplier set at 2x, the SK080 also works at 2×33 MHz as expected for a CPU marked as a DX2-66. To be 100% sure, I was able to find another sample to confirm these findings.