By Rob Aldrich, CTO and Founder of EcoLibrium Services, LLC.
The second of a two-part series on data center capacity management.
Our best practices series continues. This post focuses on audits and getting organizational alignment to improve data center capacity. Best practices can be implemented by taking a holistic approach when evaluating your data center needs.
Conduct quarterly or even monthly audits of electrical and mechanical system loading. Use the data garnered from these audits to establish trend lines on electrical and heat removal requirements for your data center. If you are not doing these already, the initial month will serve as a baseline reading that can be compared against. From there, it becomes a question of how you structure the data so that a range of users can interpret it across Facilities and IT operations.
Once you have a data stream established on the Facilities side of your operations, you can work with your IT counterparts to start tracking data sets on IT asset utilization. There should be some level of proportionate relationship between IT asset utilization, electrical and mechanical systems utilization. The balance between the facilities (power and cooling) and IT asset utilization (compute, network, storage and other) is what’s really at the heart of data center capacity management.
An additional aspect to consider is that even more important than balancing capacities across the sub-systems in a data center, is the overall system capacity of this critical environment. It’s worth mentioning that the overall systems capacity has a direct impact on the data centers redundancy and ultimately its availability. Uptime for a data center has always trumped efficiency. This is changing somewhat with the advent of cloud architectures and virtual machine motion but for most data center operators, capacity management directly supports total system availability.
If you are a facilities professional and unsure how to engage your IT counterparts on this; here is a simple metrics cheat sheet that might help:
Capacity Metric |
Description |
Just like it sounds, this is a measure of how much a given system (servers, network and storage) is being used. What makes this a little complex is each of these systems measures utilization differently. |
|
This is a measure of how efficiently a computer data center uses energy; specifically, how much energy is used by the computing equipment (in contrast to cooling and other overhead). |
|
This is a performance improvement metric used to calculate the energy efficiency of a data center. DCIE is the percentage value derived, by dividing information technology equipment power by total facility power. |
Don’t
If you don’t have some rigor around understanding where this balance between facilities and IT domains is for your data center, you will likely face unwelcome surprises in the future. Don’t look at your regular facilities auditing as a finite project. Look at the data garnered from these audits as the first step in an expansive effort to bring in multiple data streams into a capacity management dashboard that all relevant teams see regularly. Furthermore, don’t underestimate the effort to bring all this data together if you want to do it all in house. It will require some time to get the dashboard(s) up and running. You might consider partnering with an external entity to either get it off the ground or to optimize it once early revisions are up and running.
Organizational Alignment
Do
Make capacity management everyone’s responsibility. Each domain will define capacity in different ways but each domain lead should know at any given time where he or she is on the utilization scale. Some basic governance should be put in place to review data center capacities as a group no less than once per quarter. Many organizations will run these meetings monthly and keep a close eye not just on real time utilization but how that utilization is trending over time. This utilization trend can then be compared to performance metrics and business growth. A capacity review monthly or quarterly typically will not take longer than 60 minutes under normal operative parameters.
Don’t
Avoid misalignment in incentives and metrics between facilities and IT operations. Facilities teams have a clear incentive to keep track of risks related to availability and resilience, specific to power and cooling. However, agility in support of business expansion (particularly when a large corporate acquisition of a new company is made) is often not a priority for facilities or IT. Fostering alignment in the planning, design, management and remediation of data center infrastructure puts your entire operation in a more agile posture. This is particularly important when you consider that it typically takes about 18-24 months to stand up net new data center capacity.
More content can be found about Rob’s approach to data center capacity management at http://www.eclservices.com and www.robaldrich.com.