Data Center Capacity Management: Monitoring & Reporting

Posted by Chris Parlee on Nov 25, 2014

 Rob Aldrich Efficiency ExpertThe first of a two-part series on data center capacity management.
By Rob Aldrich, CTO and Founder of EcoLibrium Services, LLC

There are several best practices that are often considered as an after thought to a sound capacity management strategy for today’s data center operations. Here are a few low- or no-cost ways to improve your approach to power and cooling capacity management. This is the first in a series of posts on data center capacity management.

Reporting
Do
Data Center IllustrationFederate reporting out to both facilities and IT departments using a consistent data access, aggregation and reporting framework. Said framework should provide capacity measurements for each domain that are intuitive to the domain managers. For example, a server manager will want to see total, average server utilization. For servers, utilization is typically measure as a blend of processor capacity, Input/Output (I/O) and memory. By contrast, a facilities manager wants to see electrical and mechanical utilization of cooling and battery backup systems. For both domains, utilization can be expressed as a percentage of total system capacity. This reporting should be provided through a centralized portal that both facilities and IT managers can bookmark and set up notification preferences when capacity thresholds are exceeded. The best example of a high-level dashboard like this has been implemented by Dean Nelson at eBay. He has implemented a management framework that supports capacity management among others called Digital Service Efficiency (DSE). DSE is a free and open model that can be adopted by any organization today through the Data Center Pulse LinkedIn group.

Don’t
Where possible, avoid too much compartmentalization of capacity measures across electrical, mechanical, spatial, network, compute and storage domains. One byproduct of a lack of alignment between facilities and IT operations are custom databases and spreadsheets that are only shared within a given team. Many of our clients end up in an unfortunate situation of project delays related to a lack of electrical and/or mechanical capacity, because the IT teams call the facilities teams at the 11th hour looking for a new power whip or additional cooling capacity. If the IT teams are not aware of the available capacity on the facilities side of the operation, there is a good chance the new servers or storage arrays won’t make it past the loading dock.

Phase Loading & Management
Power Whip PhotoDo
Look to standardize on smart, rack-based power distribution units (rPDUs) that provide local and network-based phase loading dashboards and alerts. Many rPDUs can be configured to set unused power outlets in an “always off” position. A simple label instructing users to call their facilities representative will ensure no new IT assets (or power tools) are plugged into the racks power distribution. Color-coding of each phase using stickers (next to each outlet) helps as well. These are a couple of examples of fairly low-tech ways to manage power allocations and phase balancing. 

Don’t
Rely solely on an initial, projected assessment of phase loading as an initial assessment often assumes that only facilities professionals who know about phase balancing will be deploying new IT equipment. This is not always the case and many users on the IT side of the house will use open outlets if they are available without much thought around which phase should be used.

In conclusion, here are some key points to keep in mind when looking to improve your capacity management approach:

1. Make things visible through federated, extensible reporting of capacity metrics.
2. Use smart rPDUs to manage electrical provisioning and three-phase load balancing.
3. Utilize the automated power outlet control capabilities of rPDU’s to manage provisioning of power inside the rack

Rob Aldrich, a corporate sustainability and data center energy-efficiency expert, founded EcoLibrium Services, where he is focused on delivering IP-enabled energy management services for large enterprises globally. Mr. Aldrich is the founder of Cisco’s Data Center and Efficiency Assurance Programs, EnergyWise Optimization Services and co-founder of Cisco EnergyWise. More content can be found about Rob’s approach to data center capacity management at http://www.eclservices.com and www.robaldrich.com.

Tags: Cooling, data center energy, Power, PDU, capacity