[Discuss] AMD FX-8120 update

Mon Mar 5 12:51:48 EST 2012

On 3/5/2012 10:02 AM, markw at mohawksoft.com wrote:
>>
>> http://www.richweb.com/cpu_info
>>    If the number of cores = the number of siblings for a given physical
>>    processor, then hyperthreading is OFF.
>>
>> I didn't think AMD did Hyperthreading...
>
> It doesn't.
>
>>
>> http://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29
>>    ...by eliminating some of the "redundant" elements that naturally
>>    creep into multicore designs, AMD has hoped to take better advantage
>>    of its hardware capabilities, while using less power.
>>
>> So does that mean it isn't just L2 cache or FPU that's being shared
>> among cores, but other more significant components of the CPU, which,
>> like Hyperthreading, are more likely to result in thread contention?
>>
>> Be interesting to see some sort of a VM benchmark compared between this
>> CPU and an Intel equivalent.
>
> I'm am by no means well versed as of yet, but my current understanding is
> that the chip has 8 full cores but they are organized as 4 pairs that
> share a math-coprocessor.
>
> I'm not entirely sure that the linux kernl fully understands the chip yet.

This page has some info (including a useful diagram) that explains the 
Bulldozer architecture: 
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-3.html

As you can see, there are four things that are shared between each core 
pair: fetch, decode, the FPU, and the L2 cache. Each core gets its own 
integer processing unit and L1 cache.

The shared fetch, decode, and FPU mean that there is some performance 
compromise when both cores in a pair are being used, though it's much 
less than the compromise of two Intel HyperThreads running concurrently. 
Against that, the shared L2 cache could be an advantage if the cores are 
running correlated tasks, like two threads of the same program. Finally, 
if two or more core pairs are idle the processor can increase the clock 
speed of the busy ones.

To get optimal performance out of Bulldozer, the OS process scheduler 
will need to be aware of the architecture. For maximum performance you 
generally want to spread tasks among the four core pairs, and only 
double up in a pair if you have more than four things running. If you 
start doubling up you probably want to pair threads of the same process 
rather than different processes whenever possible.

Optimal power management would call for a different strategy, pairing up 
tasks whenever possible and keeping as many of the core pairs as 
possible idle. I don't know if any OS even has the hooks to change the 
scheduling algorithm based on the choice of power management settings.