Building an Ecosystem from mSA?

Recently BEA has been talking a lot about their new MicroServices Architecture. By leveraging OSGi they contend they will be able to more rapidly and effectively compose middleware environments. The way it is described though, it seems to me that the advantages are all internal. From the point of view of a customer, I think a valid question might be, “who cares?”

I think BEA could be leveraging this new MicroServices Architecture (mSA) much more effectively to build an eco-system of community in much the same way that Firefox has. By opening up the OSGi bundle-level api’s, or even better, the source itself, BEA could encourage the development of an entire ecosystem around their core middleware services.

Within the defense space I can think of a number of other companies that are embracing OSGi for the same fundamental reasons that BEA is touting; things like rapid composability of capabilities in the run-time. I believe those companies would leverage core BEA capabilities for networking, thread pooling, standards interoperability etc. if they could do it at the Chinese menu level.

Maybe BEA *is* thinking about it this way but their marketing to date doesn’t express it that way. It seems to talk more in terms of the internal benefits to BEA of mSA as a product strategy; much the way Ford might talk about their “world car” platform.

• • •

Another Business Model for Software Companies?

I attended a meeting last week with Peter Bostrom, BEA’s Federal CTO. During a discussion of open source in the DoD, he made the valid point that integrators aren’t the best suited for developing large scale software products. They don’t have the product management capability, standards body interaction, versioning expertise and all of the other product-oriented DNA that is necessary to effectively develop software products.

I’ve posted before that I think widely used infrastructure projects like the Army’s SOSCOE (the software underpinnings of Future Combat System) should be developed under a “funded open source model.” In the future I would like to see contracts written that still expect delivery of functionality within a particular time frame; however, with additional deliverables that require the contractor develop and host an open source community around the product.

Today traditional software product companies like BEA participate in DoD software projects rather passively. As subcontractors they do little more than provide software licenses (and perhaps some focused integration expertise) to the integrators. Would it be reasonable to believe that they could alter their business model to un-bundle their software development expertise from their products and in the future sub to the integrators not as a mere license provider, but as the funded open source developer? In other words, could a company like BEA be hired to be the JBOSS for SOSCOE or other infrastructural products?

I really can’t say whether this model is workable (especially for firms like BEA who would have to change culturally to be successful in open source development). But I think the continued erosion of sales of proprietary middleware is inevitable and a defensive posture just isn’t going to work in the long run. Better to go on the offensive and try things that leverage the core strengths of the firm than sit back building a Maginot Line (unless the prospect of Vichy politics suit you).

• • •

JBI, RSS, and Continous Integration

About to clear out for the weekend but decided to quickly go through some blog posts I’ve been meaning to read. I have to send props to Jeff Black (who works for the same company as I do) for his blog post on aggrating RSS feeds from multiple continuous integration environments via an RSS binding component in JBI. Cool example of how “web”-oriented JBI binding components (e.g. RSS BC, XMPP BC, SIP BC, etc.) might be used to bridge the gap between “web” and “Enterprise.”

• • •

An Optimization Approach to Data Center Efficiency

Today I listened to the green grid’s recent webex presentation (you’ll need a webex player to listen to it). It was an interesting presentation and it raised some points I hadn’t really considered before.

For example, the efforts of the green grid are focusing quite heavily on the data acquisition problem inside the data center; sort of an equivilent to distributed automated metering. Given that my company does work in the automated meter reading and data management space, I can’t account for why that didn’t occur to me before. Getting old I guess.

I’m not sure where this focus will lead based on what they disclosed, but it seems like it will be interesting. Maybe some kind of SNMP-like protocol that captures a “meter interval,” current rate of consumption, and associated average and point CPU utilization? SNMP may be an overkill way to do it, but in any case I assume that it will have to be more than just an IP-based self-reporting meter since it will be useful to have some correlated usage statistics (e.gI CPU utilization) to go with the power consumption. If collection is distributed out to every component (server, storage device, etc.) this will be quite a data storm to collect, persist, and analyze.

On a different note, if you look at the list of participants in the green grid, you may be surprised like I was by the lack of software participants at the contributor level. It is only natural that this would be led by the hardware guys, but too much of a hardware emphasis misses the role that software plays in the efficiency equation at both the component and system level.

For example, at the component level in software there is growing interest in software efficiency in areas like kernal design. How long can it be before you’ll be able to load a CPU’s energy consumption characteristics and your predicted kw-hour price into DTrace and be able to see not just the performance impact of coding decisions, but the predicted electrical cost impact as well?

However, I think software participation is even more important when data-center-as-a-system efficiency is considered.

Automobile’s, like data centers, are systems. Automobile manufacturers can model a new vehicle as the product of the sub-efficiencies of it’s power plant (thermodynamic, volumetric, mechanical,…), drive train, aero characteristics, and etc. With these formulas, many of the components of which are empirically based (e.g. aero efficiency, which relies on turbulent flow…), manufacturers can determine maximum acceleration, most efficient speed, efficiency at highway speeds, and etc. before they ever build the vehicle.

They can compare various power plant configurations (normally aspirated, flex fueled, Miller cycle with hybrid drive, etc) and predict most efficient speed and efficiency at typical speeds for each configuration. By including cost-to-build and cost-to-operate data they can develop an optimization objective function to drive design choices. Because they do this, not every car ships with a 300HP V-8 (well, that’s not the only reason; but in data centers it seems like everything needs a V-8).

Data center designers should have a similar set of theoretical and empirical tools on hand to allow them to optimize component selection for a particular kind of data center workload. What are critical latency constraints on an application-by-application basis? What are the component costs? What are the operational costs associated with direct electrical consumption and electrical consumption related to cooling and other secondary systems? What about the cost of floor space?

In an interconnected system like a data center, a faster processor doesn’t just consume more electricity directly, it creates more heat which results in more cooling load. If it is on but not operating, it consumes some base electrical and cooling load; but as CPU utilization increases, does heat dissipation increase linearly or greater than linearly? As cooling load increases, what is the shape of the cost response curve associated with servicing that cooling load? Theoretical or empirical models for these and other components would permit us to answer these questions, and then broader system level questions like:

– In the same way that automobiles have a most efficient speed, do servers in the context of a broader data center system have a most efficient CPU utilization (from an energy consumption point of view). If it exists, how does that most efficient utilization correspond to equipment cost and user latency constraints? Are they like a car that is most efficient at 25mph or 55mph?

– In a data center that had many processors running a similar application in a job dispatching approach (e.g. Google) would the data center efficiency on a joule per transaction basis be better with fewer processors running at higher utilization or more processors running at lower utilization? For a given cooling system, physical layout, power distribution system and computing workload characteristics, what would the optimium dispatched CPU utlization be?

– Would the overall energy consumption of a given data center with a given workload go up or down if all of it’s processors could be replaced with a greater number of slower-clock-speed but higher-efficiency processors designed for mobile technology such that user response latency was unaffected (considering all important factors from cooling load to floor space to power supply losses and etc.)?

– Given an existing data center design, would an investment of $X be better spent on updgrading the cooling sub-system, storage sub systems, or servers?

– What is the predicted financial impact of replacing the OS kernal on X machines with one that has better power management characteristics while maintaining a constant workload?

I’d like to consider more how software for job and/or VM dispatching will be important in the context of a most efficient utilization, but this post is running way long already so maybe another time…

• • •

SOA as Panacea

To deliver effective solutions the IT landscape needs to cover a lot of ground: enterprise integration, user experience, application architecture, “stacks”, languages (Java isn’t the only one, even now), data standards, and so on. In 1996 most IT trade magazines seemed to forget about everything but the web and the thin layer of technologies that it was implemented on. Today the Department of Defense sort of feels sort of that way as all discussion of anything other than SOA has been drowned out. SOA is the panacea of the moment. “We won’t build applications anymore, we’ll just have services out in the cloud and people, machines, and whatever will discover them and consume them.”

Naturally, I don’t think that statement makes sense as the real world is more complex and nuanced than that. If you don’t believe me, just tell your users they don’t need applications and that they can just go to your UDDI repository, find services, and start mentally de-serializing SOAP envelopes.

But that isn’t really what triggered this post. This article about the complexity of enterprise software that Nick Carr discussed today did. It is a sceptical (if dour) take on SOA as rescuer from complexity. If it wasn’t enough that SOA alone just doesn’t cover all the basis, it may just be that SOA adds as much complexity as it attempts to contain.

• • •

OSCON 2007 Wrap Up

If you weren’t able to attend OSCON in Portland two weeks ago you might enjoy some of the links here. Most or all of the presentation materials are online and at least the keynotes are available as video.

My favorites include Simon Wardley’s “Commoditisation of IT…” (you’ll need to download the slides to be able to follow his jokes, the slides aren’t visible in the video), Steve Yegge’s extemporaneous slide-free riff on “How to ignore Marketing and become irrelevant…“, Robin Hanson’s “Overcoming Bias“, and finally… my absolute favorite, James Larsson with “Pimp my Garbage.”

Lots of people seem to be watching this one too. Simon Peyton-Jones talks about Hascall and it’s applicability to parallel programming.

• • •

Data Center Energy Use

I read the EPA’s just released report on data center efficiency with interest tonight. I was expecting to have to wade through a really dull tome, but it is actually quite an interesting read (yes, I am that much of a geek). Here’s a nice crib sheet if you don’t want to read all 133 pages; but I’m warning you, you’ll be missing all the good stuff.

A few interesting tidbits:

* ~ 1.5% of all electricity consumed in the U.S. is consumed by data centers at a cost of ~$4.5B. 10% of that is consumed by Federal government data centers. (they neglect to mention how much of that is consumed at Ft. Meade). We spend more on data center electricity than we do on color television.
* Within data centers, the distribution of power consumption is approximately 50% IT gear and 50% facilities (cooling, power conversion, etc.).
* The distribution of power load among the IT components is roughly 10% for network equipment, 11% for storage equipment, and 79% for Servers.
* Within a typical server at peak load power consumptions look like:
* CPU 80 watts
* Memory 36 watts
* Disks 12 watts
* Peripheral slots 50 watts
* Motherboard 25 watts
* Fan 10 watts
* PSU losses 38 watts
* total ~ 251 watts – note that the CPU consumes only 31% of the total in this typical server.
* Best current “volume” server designs consume ~25% less energy than similarly productive normal volume servers.
* The report estimates that server consolidation and virtualization may result in 20% energy savings in a typical data center.

Interestingly, they missed the emergent trend of hybrid drives and the potential impact they may have on storage efficiency.

Based on the numbers above, a data center of 10,000 highly-utilized typical “volume” servers would consume approximately 6.52 megawatts roughly distributed like:

CPU’s only~ 800kw / Server total ~ 2.58MW
Storage ~ 350kw
Network ~ 330kw
Total IT Gear ~ 3.26MW
Cooling, Power Conversion, and facilities: ~ 3.26MW
Data Center Total ~ 6.52MW

With the hierarchy of overheads layered on top of the CPU’s, only 12% of the power is being consumed by the CPU’s to do computational work. This is at least a partial explanation for why Google goes to such pains to build their own servers out of commodity CPU’s combined into unusually compact configurations. If they can get significantly more CPU out of their server overhead (shared power supplies, motherboards, etc.), computationally bound applications can be much more productive per watt consumed. The problem with all this overhead power consumption is that even significant improvements in multiple core / dynamic frequency voltage scaling chips tend to be damped and their overall impact limited (though any change that reduces CPU energy consumption will generally also reduce the required cooling load as well).

I have a number of other things in this report that I’d like to comment on; but to keep the rest of this post reasonably short I’ll focus on one area in particular.

A significant amount of the report details a variety of energy saving approaches and then defines three major improvement scenarios based on those best practices: Improved Operation, Best Practice, and State of the Art. It then goes on to suggest a standard data center energy efficiency rating that can be used for comparison between data centers. The problem is, the way the rating is defined, different data centers that are doing better or worse jobs of implementing best practices will be able to get the same overall rating.

The problem is defining an objective measure of efficiency with a consistent definition of useful computing work against which to normalize the energy consumption. Servers are used for wildly differing purposes so no such consistent measurement of server productivity exists. The report details these difficulties but then, instead of coming up with a best guess proxy for server productivity, it punts. The efficiency measure they fall back to essentially ignores the efficiency of the servers and whether they are being productively employed and simply measures the efficiency of the data center at delivering power to the servers (by dividing the power delivered to the IT gear by the total power consumed in the data center).

This is a useful measure in some ways, and will drive important behaviors that improve the efficiency of cooling and power conversion systems, but seems to me that it will do little to focus on the efficient productivity of the servers themselves. It would be like the CAFE standards focusing on the efficiency of drilling, refining, and distributing gasoline but ignoring the gas mileage of the vehicles consuming it.

Given that virtualization is one of the mechanisms for improved efficiency and the fact that virtualization tends to drive utilization up, it seems to me that they could have just used CPU utilization as a reasonable proxy for output productivity and achieved an efficiency rating standard of normalized average CPU utilization per watt consumed. Such a standard would be imperfect (data centers with heterogeneous applications might have a harder time achieving it than CPU-intensive single application farms for example) but it would at least incent the kinds of improvements in core computing efficiency that the best practice section encourages.

• • •

Getting used to it

In this article from National Defense Magazine, Roger Smith of the Army’s Program Executive Office tells gaming companies to “get used to” the DoD’s cumbersome bureaucracy. At issue is the cost for small companies to participate in a market that has tremendous bureaucratic hurdles weighed against the military’s more complex requirements.

Smith’s argument is that the government is going to use technologies that it buys now for decades so the process to buy them is much longer. The government simply won’t “throw away” the systems it already has to use the newer technology.

I think this line of reasoning is flawed for two reasons. First, because it assumes that a set of technologies changing as rapidly as gaming technologies should continue in service for decades, and second because it doesn’t acknowledge the fact that the bureaucratic burden doesn’t just cost those vendors, but costs the Department in lost opportunity. If you make yourself difficult to sell to, you get more expensive less capable stuff slower.

Smith makes the point that the military spends money the way it does because it uses it to build things like tanks; which may remain in service for 20, 40, or even 50 years. That may be true for mature platform technologies, but it wasn’t true for platforms like tanks and airplanes earlier in their development lifecycles. When those platforms were going through rapid periods of innovation some of them didn’t even stay in service for five years.

For example, the Air Force today spends more than a decade to procure a jet aircraft, but during the single decade of the 1960’s, the 100 series airframe designators for fighter aircraft turned over like the numbers on a gas pump. If the DoD had purchased rapidly evolving fighter aircraft in the 1960’s the way it purchases mature platforms now we would have never stayed out ahead on the capability curve. Ditto for tanks in the period between the wars when the then field-grade Eisenhower and Patton were preparing for their big moments by working on the rapid evolution of armor tactics and platforms.

To address the other point; what capabilities from what vendors are the services just not getting access to because the process is simply too taxing? Companies that might try to contribute but can’t figure out how, or it takes too long for them to afford to pursue? What innovative players are simply opting out because the market isn’t deemed worth pursuing when the “tax” of playing is factored in?

Where the DoD has legitimately different requirements, it needs to do acquisition in a manner that addresses those requirements. However, it is to its advantage to take a facilitative / incubation-oriented approach to address those requirements with the minimum burden necessary on those fast, capable, and innovative firms whose technology it wants to access. Instead of waiting for every firm to “get it” and turn into “defense contractors”, let’s figure out how to better facilitate the involvement of everyone else.

• • •