If you’re serious about buying a MacBook Pro and do some web research, you’re going to come across some issues MBPs tend to exhibit. There’s even a website entirely dedicated to keeping track of all the defects.
One of the two predominant problems is the so-called whine.
Now, there’s some confusion over what the whine actually is.
3 types of noise are usually mentioned:
- A high-pitched noise that can be heard only when the LCD panel brightness is set to somewhere between the minimum and the maximum level. This indicates a failed inverter and supposedly many early models were subject to this flaw.
I won’t cover this since this is an obvious hardware defect. Had you bought a faulty model you would already have had that repaired under warranty or should have had done so.
- A sound coming from below the right side of the keyboard — “The Moo”. This is apparently the sound of a fan engaging and disengaging. Reports disagree, some people suggest the problem is in the SMC firmware that controls the temperature thresholds at which the fans engage, while some argue that the fans are defective.
The latter hypothesis seems more plausible — if there was an error in the firmware, then all the models would be affected, which is not the case. I don’t really know if Apple acknowledges this as a hardware problem, but it’s only logical they should.
- A high-pitched squeal coming from the left side of the laptop, just near the power adapter socket. This is often — semi-correctly — described as “the processor noise”, as the sound varies (sometimes to subside completely) depending on the processor load.
Now this one is a little tricky and demands a more detailed explanation, so tune in for the gory details below.
ACPI, The Other Brain-damage
Apple’s transition to x86 architecture is a mixed blessing. Obviously, you get the chip volume that IBM couldn’t deliver, you get better performance or performance per Watt, whatever. On the other hand, you fully embrace all the legacy crud that x86 carries. This includes ACPI.
ACPI (short for Advanced Configuration and Power Interface), is a system devised to facilitate the transition from all the legacy devices (think serial ports, ISA bus devices) to the new and fancy PnP stuff (USB, PCI). That’s the configuration part. The power management is about giving more control to the operating system, replacing the obsolete APM that was almost completely transparent to the OS1.
You can get the ACPI specification here, it’s a tough but interesting read.
When it comes to processor power management, ACPI defines several types of processor states, which Intel processors and chipsets fully implement (Intel being one of the four major companies behind ACPI). They are referred to by a capital letter (’T', ‘P’ or ‘C’) and a number, ranging from 0 to n, where 0 is the state of the least power saving and the greatest performance. The states can be combined to achieve optimal energy savings.
Tx states
‘T’ is short for throttling. By throttling the processor, the OS tells it to do nothing for a short period of time — i.e. it sends a hlt instruction on x86 hardware. There are usually 8 levels of throttling on Intel chips (with 12.5% increments), thus the state T2 means the processor does real work 75% of the time, and does nothing in the remaining 25%. Consequently, in T7 the processor slacks off for 87.5% of the time.
It is important to note that, say, 50% throttling doesn’t necessarily imply that there is exactly one hlt instruction issued after every other (i.e. real work) instruction, the periods are usually longer by several orders of magnitude.
Another popular misconception is that throttling actually saves power. In 99% of cases it does not. That is because the throttling does not reduce neither the CPU frequency, nor voltage, and the amount of work the processor has to do is more or less fixed anyway — your machine does the same amount of work, but twice as longer.
Throttling is instead used as a thermal management system, as processors usually emit less heat when halting. Thus e.g. when the cooling system fails and the temperature rises above a certain threshold (known as the passive cooling threshold), the chip is slowed down up to T7 to reduce the heat it produces. See this famous THG video for a nice demonstration.
Px states
This is what the marketing refers to as SpeedStep or Enhanced Intel SpeedStep (EIST for short, Cool&Quiet on the AMD’s part). This method involves changing the CPU frequency and, consequently, voltage. This is where the real power saving begins.
Why is that, you may ask? Surely, even when the CPU slows down, it still has the same amount of work to do, as per the above example, and it just takes it longer to do that work, doesn’t it?
Remember Ohm’s Law? P=U2/R, that is, power is equal to the voltage squared over resistance? The idea here is that while the time needed to do a defined amount of work(and hence, the total power consumption) increases linearly with a given decrease in CPU frequency, the power the chip draws during this time decreases quadratically, and usually faster.
“Usually” my ass. I was totally sure it was true, but when I’ve done the math for a 2.0 GHz Yonah (that’s Core Duo T2500 oficially) it turned out that for a given work unit it is better (in terms of total power consumption) to keep it at 2.0 GHz until the work is done, even on battery power. Go figure. Surely, my calculations were simplistic, but probably not that far off. Anyway, please remember this little fact as it will play an important role in the next post.
Anyway, consider now an idle chip. For reference, we’re talking the half-a-second-between-keypresses type of idle and not the left-alone-for-half-an-hour one. An idle chip that is throttled (say, T7) does exactly the same work as an unthrottled one — hlt all the time. No cookie. An idle chip that is stepped down to half of its frequency and 75% of its voltage does exactly the same, but draws less power and generates significantly less heat. Yummy!
Cx states
Both Tx and Px states are runtime states, i.e. they take effect when the processor is doing the Real Work. Cx states, on the other hand, are only used when the processor is idle (again, half-a-second idle).
More specifically, C0 is the normal operational state of the processor (think Real Work again). C1, on the other hand, is when the processor receives the hlt instruction (remember that?). These two states are mandatory for all ACPI-compliant systems and virtually no power savings occur at these.
C2, called stop grant on Intel chips, is where the savings come, as processor core voltage drops and several clocks are turned off at this point. C3 (usually called Deep Sleep, but there’s some confusion over that term) and C4 (Deeper Sleep) continue to disable key functions of the CPU and the chipset (e.g. cache snooping) and the voltage drops down to approx. half of that of C0. We’re talking massive (up to 80%) heat and energy consumption reductions, and not only for the CPU, but for the whole system.
Usually up to five (C0–C4) C-states are defined, plus variations thereof. However, due to diminishing returns, the benefits of those below C3 are usually less significant. Arguably the most important step is C3, as it requires some significant work on the chipset’s part, and also provides the biggest power saving delta of all the C-states.
The key limitation of the Cx states is that the processor needs to be idle to make use of them. Thus, in order to make C-states actually usable, the transition latency (i.e. the time it takes to change from one C-state to another) must be very low, so that the user is not aware of what is really happening. If you had to wait for half a second for a letter you typed to appear on the screen, would you find that comfortable?
C1 latency is 0, by definition (as to why, it is left as the exercise to the reader). With modern chips, C2 latency (the time it takes to go from C2 to C0) is usually negligible. Due to the aforementioned technical issues, C3 latency is much higher, around tens of microseconds, but usually still low enough to consider an aggresive C-states usage policy in the operating system.
The Core of the Problem, Literally
You’ve probably just been bored to death by all the technicalities, my dear reader, so let me just cut it short and say:
The whine you’re hearing or reading about is the sound of Cheap Hardware.
Surprised? Or maybe not? The point is, with the aggresive utilization of P- and C-states, the voltage regulators and the capacitors that stabilize and deliver the CPU core voltage endure an enormous strain of voltage fluctuations, from 50% to 100% several hundreds times per second. Incidentally, overloaded capacitors “sing” or buzz, depending on the frequency of voltage changes. So all it takes is for a batch of capacitors to come even slightly below spec (read: cheap) — and a 1″-thick laptop is not an ATX mainboard where you can add excess capacitors to compensate — and voilà, there’s your whine.
Whether this is simply a batch of bad hardware, a design flaw or a QA omission is irrelevant, what it all boils down to is that if you want to fix the root cause of the whine, you have to exchange the mainboard. Incidentally, that’s what Apple started doing recently.
Workarounds
“But the Mirror Widget works equally well!”, you might say. Well, not really. Save for a replecement mainboard, the only way to reduce the whine is to reduce the load on the capacitors, and that is by lowering the voltage fluctuation amplitude. If there was a way to somehow disable the states C3 and lower, the CPU core voltage wouldn’t have to drop that low and the capacitors would no longer be overloaded.
Lo and behold, it appears that it’s exactly what the Mirror Widget does. For reasons I won’t discuss here, any significant USB activity will prevent the CPU from entering C-states C3 and lower. And guess what, open up System Profiler and click on USB. There’s an entry below “USB High-Speed Bus” that says “Built-in iSight”. So, according to this, every time you use iSight, the whine will be gone? Open Photo Booth and check for yourself2.
Then there are lots of other tools that explicitely manipulate the available C-states, I even saw a screenshot of a panel where you could specify which C-states to use. The idea here is the same — to reduce the voltage amplitude, and hence the stress on capacitors.
The Catch
“Wait a sec, didn’t you say that it’s C3 that provides the most power savings?”, you might ask, and you will not be at all mistaken. Workarounds are called so for a reason, they come at a price, and the price here is steep: a significant reduction in battery life for starters. While a fresh MBP with Bluetooth off, AirPort on might last about 4 hours on one charge, the same machine with iSight on all the time (think any significant USB activity) will only last for about 2.5 hours or less. Add to that the excess heat the machine will produce, and you get the idea.
So, unless you get your mainboard exchanged, you’ll stuck with either a noisy laptop, or a hot and battery-draining one. Call that a trade-off.
On the bright-side, if you don’t find the whine particularly disturbing, you can use it as an indicator of energy consumption. If you hear it continuosly, then your CPU is apparently making heavy use of C3 and C4. If you can’t hear it, something is using up your USB bandwidth and your battery is being drained.
Remember: modern processors are power-hungry. Hearing your processor whine about more power is the key for long battery life on Mactel laptops. Get your CPU on the revolutionary no-USB diet (as seen on TV the Internets), starting today!
Congratulations, you’ve made it to the end. I hope you liked what you read, or at least it helped a bit. Stay tuned for the next piece, where I explain where all the heat comes from.
- This is an entirely x86-centric oversimplification, I’m aware of that. [back]
- The reason the Mirrow Widget was so widely used to eliminate the noise was that its effect lasted even while it was off-screen and apparently not running. This was actually a bug in USB handling that was supposedly fixed in the 10.4.7 update, hence the Mirror Widget stopped working. Please consider that although it seemed to work in < 10.4.7, the catch below still applied. [back]