processor upgrades Graphics Cards

G4's Can Use Them But Are They Worth The Additional Cost? The 2MB Backside Cache Option Explored

Don Engstrom, Reviews Editor

Introduction: A while back we conducted a review of XLR8's G4/350 upgrade card which, like all other current G4 upgrades, sported a 1MB backside cache. As many of you know, the G4 processor can support a backside cache up to 2MB. As we have noted in the past, there is a significant performance difference between a 512k backside cache and a full 1MB backside cache. Conventional wisdom would then dictate that doubling the cache size again from 1MB to 2MB would again result in impressive gains. To help us test this theory XLR8 sent us a prototype card with a G4/350 and 2MB backside cache. You should note that this card is not currently in production and based on what we discovered below, it is doubtful that you will see them offered any time soon.

Test Machine Configuration

Our test machine was a 9500 with 96MB RAM and OS 9 installed. We tested with an extension set comprised of all OS extensions plus those installed by XLR8 and, in some cases, PowerLogix. For the MacBench tests virtual memory was turned off and disk cache was set to 512k. These settings are consistent with those used on the MacBench base reference machine, a beige G3/300. For the real world tests we turned virtual memory on and set it to 97M

MacBench 5.0 Scores

MacBench 5.0 is a subsystem-level benchmark that measures the performance of a Mac's processor, disk, and graphics subsystems to name a few. MacBench normalizes all scores relative to the base machine, a Power Macintosh G3/300. The base machine receives a score of 1000. For all MacBench tests, higher numbers mean better performance. For more detailed information on MacBench click here. Remember, MacBench 5.0 came out well before the G4 processor and was consequently not written to take advantage of or test the AltiVec (AKA Velocity) instruction set. Almost all of the scores below fall within MacBench's 5% margin of error.




"Real World" Tests
(Shorter bars are better)

Time to Scroll a 574 page AppleWorks document from top to bottom.

Using the same document as above we did a search/replace command to replace the word "the" with the word "macbench," 12,900 occurrences total! We are puzzled by the poor performance of the 2MB card and will run this test again.

Photoshop 4 "Real World" Test Results

All scores are relative to the stock 9500 which was assigned a score of 100. Lower numbers and shorter bars are better. All tests were run using the AltiVec plug-in provided by XLR8.







Render Boy 2.2.0

Time to render "Pool Table" Example file

SoundJam MP3 Encode

Time to encode a CD track 4 minutes 26 seconds in length. Shorter times are better...

Conclusions: I was surprised to discover that the additional MB backside cache proved to be of little or no benefit. Considering the significant performance increase of a 1MB cache compared to a 512k cache, it is not unreasonable to assume a 2MB cache would offer a more striking difference. Perhaps the problem is akin to running non-AltiVec enhanced software. Perhaps some code needs to be rewritten to make the above applications aware of the extra available cache space. Your comments and thoughts on this subject are more than welcome... Post them on our discussion board or mail them to Don


There is 1 reason I can see for reduced performance with a larger cache. It involves the time to flush the cache and refill it after a cache miss. The 2MB cache would be optimal when the data set resides completely in the cache along with all the changed (dirty) data. If there is a cache miss the complete contents of the cache, all 2MB, must be flushed (dirty data written out to ram and new data loaded.)

The 1MB cache may have a higher miss rate but it only has to load 1MB which would take half the time of a 2MB cache. Unless the miss rate for the 1MB cache is twice the 2MB cache the 2MB cache could be slower.

If the OS supported it, the cache could be broken up into sections, (application determined variable sizes would be ideal), and prefetching could be done. This would probably be done in support of a multiple processor or multiple processing OS (one of the Unix OSs or, possibly, Mac OS X.)

With a RISC processor being used the extra space could also be used to extent the registers (which many of the modern RISC chips have trimmed down to cut costs) which is where RISC chips gain their speed from (balanced by the larger number of simpler instructions they have to transfer from memory when compared to CISC chips.)

Ron Skoog