The Performance Edge: There is only one optimal bus speed for each CPU speed and perhaps fastest is not best!

A Newer Technology White Paper

Overview

There is only one optimal bus speed for each given CPU speed for G3 and G4 processors. This is a particularly important consideration when upgrading computers which use FPM (Fast Page Mode) or EDO (Extended Data Out) memory. Some G3 and G4 upgrade cards offer user adjustable bus speed selections. This is a carry-over from older system designs and is not appropriate for designs based on the G3 or G4 PowerPC processor.

There exists a common misconception that, the faster the bus speed the better the performance. It may seem intuitive to most that clocking your computer’s system bus faster would improve performance. However, for G3 and G4 based computer systems this may actually reduce performance.

In G3 and G4 systems the L2 cache is located on a dedicated bus. This leaves the system memory (DRAM), the most performance critical device on the system bus. Any access to DRAM involves a large number of wait states, so the trick is to use the time efficiently. To do this one needs to pick bus frequencies whose clock period (1/frequency) divides nicely into the time required to access the memory. If you pick an unfortunate value, say one that requires 4.1 clock periods, you must set the memory timing to 5 clock periods to avoid violating timing requirements, thus wasting 0.9 clock periods. Therefore, there are certain "sweet spots" in bus timing that give the best performance - a faster or slower frequency is worse.

This was not true with the traditional "look-aside" L2 caches used in machines based on the PowerPC 601, 603 or 604 processor. In these machines the L2 cache was on the main system bus and ran at a fixed ratio to the bus, faster was always better, DRAM timing was a very secondary effect.

Detail Explanation:

Fast system bus speeds may cause a reduction in both performance and stability when upgrading a PowerPC 601, 603 or 604 based Mac to a G3 or G4 PowerPC processor. In order to understand this confusing issue a basic understanding of the dual bus G3 and G4 architecture is necessary.

Single Bus Processors, (601, 603, 604)

Increasing system bus speed improves performance.

The PowerPC 601, 603 and 604 processors have one bus. The I/O, Cache memory and system memory (SIMMs and DIMMs) all share this bus. This is to say these devices share the same physical connections for communication with the processor. By far the fastest device on the bus is the L2 cache.

L2 cache is relatively small high speed static memory. The cache controller predicts what data is likely to be needed from the slower DRAM based system memory and preloads this data to the faster cache memory.

A "cache hit" occurs when data requested by the processor is found in the cache. In the case of a cache hit the data will be accessed at full bus speed, usually between, 25ns (40 MHz bus) and 17ns (60 MHz bus). For a single bus system increasing the bus speed gives better performance.

A "cache miss" occurs if the requested data is not found in the cache memory. In the case of a cache miss the data must be fetched from the slower system memory. In a single bus system the fast cache memory along with the slower memory, and other slower I/O devices, are connected to a common bus. This mismatch in device speed and bus speed is handled by adding "wait states" when accesses are made to the slower devices. Once a bus cycle has started, wait states are added by holding the state of the bus for one or more additional bus clock cycles.

The likelihood of any particular access being a cache hit or miss will vary depending on the cache load algorithm and on the application itself. A typical system will usually average a 70% to 80% cache hit rate making fast cache accesses the majority of bus activity.

Dual Bus Processors, (G3, G4)
Bus speed should be optimized for memory access time.

Upgrade cards based on the G3 or G4 processor have two buses. One for the traditional I/O and system memory and a second bus dedicated only for the L2 cache. Because the second bus is dedicated to high speed memory it allows for much faster L2 cache accesses than a single bus system could have. This high speed memory is soldered directly to the G3 or G4 upgrade card so access times of 3ns to 8ns are possible. This eliminates the need for a fast system bus clock because the L2 cache is no longer on the system bus.

System DRAM memory is accessed at either 60ns or 70ns so wait states are required to access this memory. A faster system bus clock may actually reduce performance due to the required additional wait states. For best performance with a dual bus system such as a G3 or G4 upgrade card, it is more important to match, or sync, the system bus frequency with the speed of the system memory. Wait states can only be added in increments equal to the period of the system bus clock frequency. There is also some fixed overhead time due to the memory controller and other system components of about 5ns which must be accounted for.

How To Select The Optimal Bus Speed

The idea is to pick a system bus clock where a given number of wait states minus about 5ns will access system memory closest to the access time rating of the installed memory modules. This must be done without violating the memory timing margins. For 60ns memory the best bus clock is around 45 MHz with two wait states. Increasing the bus much beyond 47 MHz will require an additional wait state, reducing overall performance.

MAXpowr SmartSet

Because wait states can be confusing for many users to understand, the MAXpowr G3 and G4 card come with SmartSet technology. SmartSet is part of the control panel which allows the user to specify the slowest speed of the memory installed in the system. The SmartSet hardware and software automatically configure the card and mother board system memory controller for the best performance. It turns out that for any given CPU speed there is only one preferred, or optimized, bus speed to achieve the best possible performance.

SDRAM Systems

The newest generation of computers from Apple Computer use SDRAM. The need to optimize for system memory access remains important. However, for these systems the bus clock speed is determined by the motherboard, not the CPU upgrade card. Apple has designed the SDRAM memory system and bus speed for the best performance. These designs are optimized for moving blocks of data at full bus speed taking advantage of the faster clock. They do however require several wait states for random data access

Additional Bus Speed Notes

Some upgrade card manufactures may not have the expertise or the technology to optimize the bus speed and system memory timing for best performance. Worse yet, they may inadvertently be pushing the system bus without making the necessary corrections to the system memory timing. This would over-clock the memory which could result in unreliable operation or data corruption.

Not all benchmark tools will indicate a performance increase from optimized memory access due to the high L2 cache hit rate with these applications. However, real world applications will benefit from optimized system memory timing.

Conclusion

In a single bus system the advantage of a faster bus outweighs the reduced system memory access speed due to added wait states. However, when these systems are upgraded to a G3 or G4 processor, the system bus clock speed should be selected for optimal system memory access. Faster will not always provide the best performance.

The MAXpowr SmartSet technology, included with all Newer Technology MAXpowr G3 and G4 upgrade cards, automatically optimizes memory timing without cumbersome switches or knobs. MAXpowr upgrades offer the best possible performance, stability and ease of use - exactly what every Mac user expects and deserves.

This is Newer Technology's Perspective What's Your's

Internal Links

External Links