August 9, 2007 by jimstogdill
Data Center Energy Use
I read the EPA’s just released report on data center efficiency with interest tonight. I was expecting to have to wade through a really dull tome, but it is actually quite an interesting read (yes, I am that much of a geek). Here’s a nice crib sheet if you don’t want to read all 133 pages; but I’m warning you, you’ll be missing all the good stuff.
A few interesting tidbits:
* ~ 1.5% of all electricity consumed in the U.S. is consumed by data centers at a cost of ~$4.5B. 10% of that is consumed by Federal government data centers. (they neglect to mention how much of that is consumed at Ft. Meade). We spend more on data center electricity than we do on color television.
* Within data centers, the distribution of power consumption is approximately 50% IT gear and 50% facilities (cooling, power conversion, etc.).
* The distribution of power load among the IT components is roughly 10% for network equipment, 11% for storage equipment, and 79% for Servers.
* Within a typical server at peak load power consumptions look like:
* CPU 80 watts
* Memory 36 watts
* Disks 12 watts
* Peripheral slots 50 watts
* Motherboard 25 watts
* Fan 10 watts
* PSU losses 38 watts
* total ~ 251 watts – note that the CPU consumes only 31% of the total in this typical server.
* Best current “volume” server designs consume ~25% less energy than similarly productive normal volume servers.
* The report estimates that server consolidation and virtualization may result in 20% energy savings in a typical data center.
Interestingly, they missed the emergent trend of hybrid drives and the potential impact they may have on storage efficiency.
Based on the numbers above, a data center of 10,000 highly-utilized typical “volume” servers would consume approximately 6.52 megawatts roughly distributed like:
CPU’s only~ 800kw / Server total ~ 2.58MW
Storage ~ 350kw
Network ~ 330kw
Total IT Gear ~ 3.26MW
Cooling, Power Conversion, and facilities: ~ 3.26MW
Data Center Total ~ 6.52MW
With the hierarchy of overheads layered on top of the CPU’s, only 12% of the power is being consumed by the CPU’s to do computational work. This is at least a partial explanation for why Google goes to such pains to build their own servers out of commodity CPU’s combined into unusually compact configurations. If they can get significantly more CPU out of their server overhead (shared power supplies, motherboards, etc.), computationally bound applications can be much more productive per watt consumed. The problem with all this overhead power consumption is that even significant improvements in multiple core / dynamic frequency voltage scaling chips tend to be damped and their overall impact limited (though any change that reduces CPU energy consumption will generally also reduce the required cooling load as well).
I have a number of other things in this report that I’d like to comment on; but to keep the rest of this post reasonably short I’ll focus on one area in particular.
A significant amount of the report details a variety of energy saving approaches and then defines three major improvement scenarios based on those best practices: Improved Operation, Best Practice, and State of the Art. It then goes on to suggest a standard data center energy efficiency rating that can be used for comparison between data centers. The problem is, the way the rating is defined, different data centers that are doing better or worse jobs of implementing best practices will be able to get the same overall rating.
The problem is defining an objective measure of efficiency with a consistent definition of useful computing work against which to normalize the energy consumption. Servers are used for wildly differing purposes so no such consistent measurement of server productivity exists. The report details these difficulties but then, instead of coming up with a best guess proxy for server productivity, it punts. The efficiency measure they fall back to essentially ignores the efficiency of the servers and whether they are being productively employed and simply measures the efficiency of the data center at delivering power to the servers (by dividing the power delivered to the IT gear by the total power consumed in the data center).
This is a useful measure in some ways, and will drive important behaviors that improve the efficiency of cooling and power conversion systems, but seems to me that it will do little to focus on the efficient productivity of the servers themselves. It would be like the CAFE standards focusing on the efficiency of drilling, refining, and distributing gasoline but ignoring the gas mileage of the vehicles consuming it.
Given that virtualization is one of the mechanisms for improved efficiency and the fact that virtualization tends to drive utilization up, it seems to me that they could have just used CPU utilization as a reasonable proxy for output productivity and achieved an efficiency rating standard of normalized average CPU utilization per watt consumed. Such a standard would be imperfect (data centers with heterogeneous applications might have a harder time achieving it than CPU-intensive single application farms for example) but it would at least incent the kinds of improvements in core computing efficiency that the best practice section encourages.